THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


PAGE 


Interior and Exterior Means Obtained by the Method of Moments. 
Epwarp L. Dopp 


On the Chi-Square Distribution for Small Samples. 
Paut G. Horn 


Shortest Average Confidence Intervals from Large Samples. 


Transformations of the Pearson Type III Distribution. 
C. A. OLSHEN 


A Test of the Significance of the Difference Between Means of 
Samples from Two Normal Populations Without Assuming 
Equal Variances. Daisy M. Starkey 

: Some Efficient Measures of Relative Dispersion. N1Lan Norris... 214 

_ Notes on the Distribution of the Geometric Mean. 

Burton H. Camp 


Note on a Formula for the Multiple Correlation Coefficient. 
H. M. Bacon 


Vol. IX, No. 3 — September, 1938 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S. 8. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHER R. pe MIsEs 

H. Cramér T. C. Fry E. 8. Pearson 

W. E. Demine H. Hore.iine H. L. Rrerz 

G. Darmots W. A. SHEWHART 

Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTIQg 

should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscript 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot 
notes should be avoided. Figures, charts, and diagrams should be drawn o# 
plain white paper or tracing cloth in black India ink twice the size they are & 
be printed. Authors are requested to keep in mind typographical difficultie 
of complicated mathematical formulae. y 


Authors will ordinarily receive only galley proofs. Fifty reprints withow 
covers will be furnished free. Additional reprints and covers furnished at cost 


The subscription price for the ANNALS is $4.00 per year. Single copies $1. 5. 
Back numbers are available at the following rates: 4 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


} 


Subscriptions, renewals, orders for back numbers and other business om. 
munications should be sent to A. T. Craig, University of Iowa, Iowa City, Io a. 


The AnNAts or Ma'rHematicat Sratistics is published quarterly by th 
Institute of Mathematical Statistics. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Battimorp, Mp., U.S. A. 





‘ pie ame 








ow 


INTERIOR AND EXTERIOR MEANS OBTAINED BY THE 
METHOD OF MOMENTS 
By Epwarp L. Dopp 

1. Introduction—The Substitutive Mean. A very general mean based upon 
substitution was proposed by O. Chisini.’ Briefly stated, this mean M of 
data x; , X2, --: , Zn, is a number which satisfies some equation of the form 
(1) G(M, M,---,M) = G(x, 12, +--+ , In). 
If, now,” 
(2) M = F(a, 22, +++ , In) 
is an explicit expression of M, then for each value c which each of the arguments 
x; can take on, 


(3) F(c, ees , €) = €5 


or at least one value of this F is ec. 

Suppose now that F(a, x2, ---,2n) is any function of 2%, %,--+,2n, 
defined for at least one set of equal arguments c, and such that whenever de- 
fined for equal arguments c, at least one value of F(c, c,---,c) = c. Sucha 


function, I have called a substitutive mean. Various extensions® are immediate, 
such as the use of integration in place of summation. Indeed, point set func- 
tions or functionals may be used.* Here I shall supplement (3) by a fairly 
common convention. If F(c, c, --- ,c) is not originally defined, but as 2; — c 
simultaneously, limit F(a, x2, ---,2%n) = c, in this case, F(c, c, --- ,c) will 
be assigned its limiting value c,—thus establishing continuity. 


2. Location and scale as means. The purpose of this paper is to investigate 
the nature of the means which arise when the well known Method of ..' oments 
is used to estimate the values of two important parameters—namely, the loca- 
tion x and the scale a@ of a frequency function or distribution. These are taken 
as associated with the variable x of the distribution thus: 


(4) f = (x — «)/a. 


1Q. Chisini, ‘‘Sul Concetto di media,’’ Periodico di Matematico, Series 4, Vol. 9, (1929), 
pp. 106-116. 

2. L. Dodd, ‘‘Internal and External Means Arising from the Sealing of Frequency 
Functions,’’ These Annals, Vol. 8, (1937), pp. 18-20. 

3 For an extension of Chisini’s results, see: Bruno de Finetti, ‘‘Sul Concetto di media,”’ 
Giornale dell’ Instituto Italiano degli Attuari, Vol. 2, (1931), pp. 369-396. , 

4E. L. Dodd, ‘‘The Chief Characteristic of Statistical Means,’’ Cowles Commission 
lecture. Colorado College Publication, General Series No. 208, (1936), pp. 89-92. 


153 








154 EDWARD L. DODD 


The nature of the distribution is then “‘specified’”’ by 
(5) y =a 8(t/); 


where ® may contain other parameters, but in ® the x and a appear only in the 
t’ given by (4). For this mode of approach, the reader is referred to R. A. 
Fisher.’ 

The other parameters which may appear in ® will not be considered in this 
paper. 

The parameters «x and a are in general unknown and unknowable. However, 
we attempt to get close estimates k and a, of « and a respectively, from a set of 
observations 


(6) Bay BOs *** pte 

To accomplish this, we have to solve certain equaticns formed in some way from 
(7) t = (c — k)/a, 

and 

(8) y =a (ft). 


These equations (7) and (8) result from (4) and (5) by substituting estimates 
k and a, respectively, for parameters « and a. 

Now the Method of Moments equates the theoretic moments—those obtained 
from some such equation as (8) with ¢ replaced by its value in (7)—to the 
moments obtained from the observation (6). 

For the following discussion it will be useful to obtain “auxiliary”? moments 
from the #(¢) in (8) before substitution is made from (7). Such moments, then, 
do not depend at all upon the values ultimately assigned to k and a. It is sup- 
posed that 


(9) [ #(t) dt = 1, 


‘ 


so that #(¢) gives probability or relative frequency. Here, for finite distribu- 
tions, &(¢) = 0 outside the interval of the distribution. We shall assume the 
existence of the first moment 


(10) w= |  10(0) dt, 


2 





and of the variance 





(11) o = | (t — py)’ &(t) dt = [200 dt — yw’; 


oo 





and we shall assume that o > 0, to eliminate a degenerate case. 


5 R. A. Fisher, ‘‘On the Mathematical Foundation of Theoretical Statistics’? Philosoph- 
ical Transactions of the Royal Society of London, Series A, Vol. 222, (1921), pp. 309-368. 















MEANS OBTAINED BY METHOD OF MOMENTS 15: 


wt 


For the empirical moments of (6), we write 


(12) X = (X, fh Xe +--+ Xn)/n = DXi, n, 
(13) S = 2Xji/n = 2(X; — X)*/n + X. 
These two moments are, by the Method of Moments, equated respectively to 
(14) M, = 1 f xb[(x — k)/al dz, 
(15) M, = :{ x’ &[(x — k)/al dz. 
But, from (7), (10), and (11) it is easy to see that 
(16) M, = k + ag, 
(17) Mz = k’ + Qkay + a(o” + w’); 
from which it follows that 
(18) Mz — Mj = ao’. 


Suppose now that 7” is the empirical variance, 
(19) r = >(X; — X)*/n. 


It follows from (12) and (13), that if M, = X and M, = S—as the Method of 
Moments requires—then 


(20) a = 7/o° = (1/0). 
And, from (16), 
(21) k = X — ay. 


These results may be expressed in the following theorem: 
THEOREM I. The estimated scale a in 


(8) y =a ‘&(), 
where by (7), t = (a — k)/a, as obtained by the Method of Moments from observa- 
tions X,, X2,--- , Xn, is the root-mean square of |X; — X | /o, where X is the 


arithmetic mean of the X;’s, and o° is the “theoretic” variance of &(t) itself, as a 
function of t—with no reference to the k or a in (7). 

Moreover, the estimated location k is a substitutive mean, characterized by 
(3), and given by 


(21) k = X — ap = X — [3(X; — X)*/o'nJ. 


As regards this final statement, it will be seen that if each X; =-c, then X = c; 
and hence k = c,—as required by (3). We may say, then, that the right mem- 
ber of (21) obtained as the formal solution of equations which the problem sets 
up, is a substitutive mean of the elements X; . 














156 EDWARD L. DODD 





But if each X; = c, then a = 0 in (21); and this a may not be used as a scale. 
However, if any two X,’s are different, a ~ 0. And it is evident that as X; > c 
simultaneously, limit k = c. If, then, we consider that the right member of (21) 
is not originally defined for equal values c of the elements X; , it is to be given 
its “continuity” value c, in accordance with the common convention already 
mentioned. 

In the special case where the function @ in (5) chosen to specify the distribu- 
tion has a first moment » equal to zero, the estimate k of location given by (21) 
is seen at once to be the arithmetic mean of the observations X,, X2,---, Xn. 





3. External means. In the papers cited, Chisini and DeFinetti gave ex- 
amples of external means. Indeed, it is not difficult to find means which do not 
conform to the condition of internality: 


(22) Minimum (X;) S Mean (X;) S Maximum (X;,). 


As a simple illustration, suppose that there are just three measurements X, = 1, 
X2 = 1, X; = —2. The standard deviation +/2 is greater than each measure- 
ment—it is an external mean. In this case also, the estimate of scale mentioned 
in Theorem I is an external mean of (X; — X)/o. But, it may be noted that a, 
the estimate of scale, is an internal mean of | X; — X \/o. 

However, it will be shown now that the estimate k of location may be an 
external mean, with an externality not ‘“‘removable’”’ by the simple device of using 
absolute values. . 

And it may be noted that in the earlier paper cited, I found by the Method of 
Maximum Likelihood estimates of the scale a, which were likewise not removable. 
THEOREM II. Jf for the function P(t) in (8), the second moment is less than twice 
the square of the second moment, then the estimated location given by 


(21) k= X— om 


is an external mean of the measurements X; , uf these are all numerically equal, 
half of them positive and the other half negative. 

Proof. Let the positive measurements be c, and the negative measurements 
be —c. Then X = 0; also in (19), 7 = c. Hence from (20), a = c/o. But by 
hypothesis, the second moment o° + y’ of #() in (11) is less than 2u’; and thus 
|u/o|> 1. Then, by (21) k = X — ap = (—c/oc)y; and hence |k| > . 
Either k is greater than every positive measurement c, or it is less than every 
negative measurement —c. In either case, it is an external mean. 

Coro.tuary. If in (t), the t is subjected to a translation t = u + b, so that 
#(t) = d(u + b) = W(u), then it is always possible to choose b so that the second 
moment of V(u) is less than twice the square of its first moment; and thus if a 
location k’ is obtained from W(u), external means may occur. On the other hand, 
by proper choice of b, it 1s possible to make the first moment zero, so that the loca- 
tion becomes the arithmetic mean X of the X,’s. 

The first part of this corollary may be seen from (11) which states that Second 





MEANS OBTAINED BY METHOD OF MOMENTS 157 


Moment = » + o. Translation does not change o’, but it can increase yp’ 
indefinitely,—making eventually y? > o°, and thus p” + o° < 2y’. 


4. Illustration. For the Pearson Type III the simplest specification is per- 
haps with the origin at the start. In this case, 


(23) O(t) = (pl) et’, p> -—l, t =0, t = (x — k)/a. 


Here p! = (1 + p). Apart from this numerical factor, (¢) is the integrand 
of the Gamma function. With (¢) in this form, it is easily seen that the first 
moment is (p + 1) and the second moment is (p + 1)(p + 2). In the usual’ 
case,p > 0. Here, then, the second moment is less than twice the square of the 
first moment. If, then, there are an even number of measurements, all numer- 
ically equal, with half the measurements positive and the other half negative, 
then the estimate k of location as found by the Method of Moments is an ez- 
ternal mean of the measurements. Such conditions, while sufficient, are by 
no means necessary for externality. 


5. Summary. Suppose that the specification for a frequency function in x 
is a ‘&(t’), where t/ = (x — x)/a, and that for the unknown scale a and location 
x, estimates a and k, respectively, are made by the Method of Moments from 
a set of n measurements X; with arithmetic mean X. Let o” be the variance of 
#(t’). Then the estimate a is the root-mean-square of | X; — X | /c, an internal 
mean. The estimate k of the location is X — ya, where yu is the first moment 
of &(t’). This is a substitutive mean of the measurements X; ; and it may be 
external—either greater than Maximum X;, or less than Minimum X;,. 


THE UNIVERSITY OF TEXAS 


6 W. Palin Elderton, Frequency Curves and Correlation, Second Edition, p. 91. 





ON THE CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES’ 
By Paut G. Hore. 


1. Introduction. The use of what is known as the x’ distribution function 
for testing goodness of fit involves two types of error. One arises from the fact 
that the derivation of this function is based upon rough approximations, while 
the other arises from using the integral of this continuous function in place of 
summing the proper terms of a discrete set. Both of these errors become in- 
creasingly important as the sample becomes small. The purpose of this paper 
is to investigate the nature of this first type of error by finding a better approxi- 
mation than the customary one to what might be called the exact continuous x” 
distribution function. 

The method employed is that of generating or characteristic functions, and 
consists in expressing successively in expanded form the generating function of 
the multinomial, the distribution function of the multinomial, the generating 
function of x’, and the distribution function of x’. Only the first and second 
order terms of this final distribution function are evaluated explicitly because 
of the increasingly heavy algebra involved. By means of these second order 
terms, the nature of the error involved in the use of the customary first order 
approximation is investigated. 


2. The Generating Function of the Multinomial. Consider k + 1 cells 
into which observations can fall, and let p; be the probability that an observa- 
tion will fall in cell 7. If n observations are made, the probability that cell 7 
will contain a; of these observations is given by the multinomial 

2 n! @1,,%2 ak+1 
P= ————— 7) Fe *** Pee 
Ql). Me. ** * Ak+1: 


k+1 
where 7 a; = n. The generating function of this multinomial can be writ- 


i=1 


2 
ten as 


7 k n 
M = Ipe"! a are os pie" + p\" = E a >, pile = | ? 


i= 


where a;4; is chosen as the dependent variable and p;4; is written as p. 


' Presented to the American Mathematical Society, April 9, 1938. 
2 Cf. Darmois, Statistique Mathematique, pp. 237-242, for the methods used in this and 
the next two paragraphs. 


158 





CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES 159 


The generating function of the 2; is obtained from that 


of the a; by multiplying M by the proper factor to shift the origin to the mean 


and then replacing ¢t; by, say, u;/~/n to compensate for the change in scale. 
Denoting this function by ¢, 


k 
aia Piui 


k n 
i=1 


Consequently, 


k k 
log g = —V/n D piu + n log E +2, p(ertv” — »|. 
= i=1 


Since the range of the u; may be selected sufficiently small for convergence, the 
logarithm on the right may be expanded in powers of the summation, which in 
turn may be expanded in powers of the u;. Terms containing 1/ n’ as a factor 
will be homogeneous in the u; of degree gq + 2. Writing down only the terms 
of order 1/n and lower, this double expansion gives 


fs 2) 2 
log ¢ = Ab (p; — piu; — 2 2 pes | 


t<) 


+ — Sala; > (p; — 3p; + 2pi)u; — ‘ > (pip; — 2pipuiru; 
tj 


ifi< il 
+2 2. pepiprusn + z Dd (pi — Tp + 12pi — 6pi)ui 
i=l 


t<i<l 


lS (pip; — 6p?p; + 6p? puru; 
6 i<j 


_< (pip; — 2pip; — 2pip; + 6pip;)uiu 


4 i<j 


2 2 
+ Dd (pipjpr — 3p; pj PIU; Us 
i<j<l 
I<i<l 
j<lei 


+ 6 z. PsP iPrPattt tt + *'s 


t<j<l<m 


Hence ¢ can be written in the form’ 


k 9 
{2 (pi—ps) ui—2 p pinjuce; | 
(2) g.= eLi=l i<i 1+ > +2 : > 4. 
n 


where A, is the coefficient of 1/+/n in (1), Ae is the sum of the coefficient of 1/n 
and Aj/2, etc. 








160 PAUL G. HOEL 












3. The Multinomial Distribution Function. 
be expressed as 


If a distribution function can 


(3) te, whan~ Lats fe 2% te 6 et 


i=1 OX; t,j=1 OX; OX; 











“So 
games fi {Tiz; 


where fo is of the form ce ? with | c;;| positive definite, then its generating 
° 3 ‘ 
function’ can be written as 


k k 
(4) F(u, +--+, Ux) = Fo|1 + > a;u; + p» 6; u;u; + |, 


i=1 i,j=1 


x 
jai jUiU; 


where F» is the generating function of fo and is of the form e ? with | aj; | 






positive definite. Conversely,’ if the generating function of a continuous dis- 
tribution is of the form (4), then the distribution function can be expressed by 
means of (3). This relationship may be applied to (2) since it can be shown to 
be of the form (4). 

The coefficients c;; of fo corresponding to ¢ can be determined by making use 
of the fact that the moments of fo can be evaluated directly by integration or 
indirectly by differentiation of g. It is sufficient here to equate expressions 
for second moments; thus 


a " 00 ea ly 
Yo —sQ,CijLizj 

= see | U3TtCo€ : dx, eee dx}. ° 
ous Ou; u;=0 - J—x 



























ogo ee ~ s#t 


Ou, OU; u;=0 (| i Bs s = ft. 


The value of the integral is known’ to be ec’, the reciprocal of the element c., 
in the determinant | c;;|. Hence 


jt {—psPe, s#t 
‘  \p—- pe, set. 


: st + — : : st r 

But c,, can be obtained from c’, since it is given by the reciprocal of c’. Thus 
ast t ast . sts st | 

Co, = 6°'/ | c* | , where é* denotes the cofactor of element c* in | c” | . 






3 Darmois, loc. cit., p. 242. 
‘See, for example, S. Kullback, Annals of Mathematical Statistics, vol. 5 (1934), pp. 
263-307. 

5 See, for example, Risser aad Traynard, Les Principes de la Statistique Mathematique, 
p. 226. 








CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES 





P— Pi —DPipe «++ —DiDe | 
De — Pp ie’ —~ P2Pk 






2 


Pk — Pr: 














= (—1)'pips --- pi Pe 







| 

|  ~ Dy 

Pr 

This determinant may be evaluated by subtracting the last column from each 
of the others and then expanding by minors of the last row. Thus 


ee 2 1- 2p 


a (—1)‘pi tes Di (—1)* ee = Pipe +++ DeP, 

















k 
since 7 pi = 1 — p from probability considerations. To evaluate é*', delete 
i=1 


row s and column t in | c*’ | , then shift row ¢ to the last row and column s to the 
last column. These shifts, together with the sign of the cofactor, change the 
sign of the resulting expression; hence 
1 


1- — 1 eee 1 
Pi 













2 ee 2 
et scat —(—1)*" Pi Pk | 
Ps Pt 









baa: 4 
Dk 





provided s ~ t. Since é* is merely | c’ | after row s and column s have been 
deleted, it may be evaluated exactly as was | c‘|. Thus 





2 2 
a a (—1)*" Pi oe Dr 
ps 


162 PAUL G. HOEL 


Combining these results, c,, = E tor s ~tandc,, = : + J and therefore 
Pp 


Ds 
” fo = ce -{z Tee Pi +5 é; 4 “|. 


By computing the necessary derivatives of fo , the explicit form of f, given by 
(3), can be obtained to the desired number of terms. Since such derivatives 
contain fo as a factor, f may be written as 


) 


where B; is obtained from A; of (2) by replacing terms in the u; with the cor- 
responding derivative of fo and then factoring out fo . 


4. The Generating Function of x°. Let this function be denoted by 


G(t) = [_- [oe te fi-+ _ — fat oo 0 A%,. 


et ; nee ‘ 
-2(4 = «, ae i+: 2 xz; 


V np; 


consequently x” is, except for a factor of —3, the quadratic form in fo. Ac- 
cordingly, letting 6 = 1 — 2t, 


e'*" fy i etx? 


i=1 


Pp i<i 


and hence 


a0 = [i [oem 


Letting z; = 2:+/6@ and denoting the value of B; after this substitution by C;, 


G(t) of. [exit ae Ss + C lee -++ dz, 
“list [ ve foCodz, --- dz, + |, 


since the terms involving odd powers of 1/+/n are of odd degree in the z; and 
therefore vanish upon integration. 

For the purposes of this paper only the integral which is the coefficient of 1/n 
needs to be evaluated. Since the algebra involved in this evaluation is heavy 
and the formulas become exceedingly long, only a few terms will be written out 
explicitly to indicate the procedure followed. 








CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES 163 















From (1), (2), and (6) it is clear that only fourth and sixth order derivatives 
of fo are needed. As examples 


4 2 

oho fo| D’ - 6pi(! m ‘) + 3(} + 4 | 

Ox; P pi P Pi 

6 2 3 
of = | bs - isp3( ~ :) ao sspi(! + -) ~ He + ) | 
Ox; L Pp Di p Pi p Di 


1 : 
where D; = a eee ¢ +- 2) Lit-:-- +2 | Following the procedure 


indicated in (6) and (7), this integral becomes 


[- "hg 7 [p; — Tp; + 12p3 — 6pi] 


D: Di 1\° 
LF -—e (G+ +) + 3(7 + +) | 


+ (similar terms of degree 4 and lower in the D,) 


(8) ko ' ; 
5h 0, — Opi + 13p; — 12p; + 4pi] 


6 4 2 2 3 
Fatt) vot) met) 
6 > a 6 \p Di P Di 


+ (similar terms of degree 6 and lower in the Dd) de ~ ++ Azz. 


= el ae 


When 6 = 1, the integral reduces to that of foBe , which in turn is the integral 
of a linear combination of derivatives of fo. But the integral of such a deriva- 
tive vanishes. As a result, if the integral of foD? has been computed directly, 
that of foD} and then that of foD$ can be found indirectly by equating the cor- 
responding bracket to zero for @ = 1. Similarly for the other terms of the above 
integral. As examples 


Hea Rees Pp Pi 


oo 2 2 
ee foDide, --- dey = 3(1 4 4). 
Eas Les Pp Pi 


Upon evaluating all such integrals, (8) reduces to 


l< 1 1\/1 , 
=r i (Di i — Op;)o| - - all 
a5 a (p 7p; + 12p op 3(1 + 1) ( ) 


i] \@ 









2 
+ (simita terms all containing (| _ 1) as a factor) 


ls, >, ' 1, 1\/1 P 
+ 35, 2 ( (p; — 6p; + 13p; — 12p; + 4p%) He + *) ( ~ 1) 


(9) 


“J 





- .. (1 , 
+ (sitmitar terms all containing G — 1) as a factor). 


164 PAUL G. HOEL 


In order to interpret these results, it is necessary to condense these various 
sums of probability expressions. If the terms are arranged in descending powers 
of the p;, it will be discovered that certain combinations condense readily. 
The condensation in each case lies in recognizing combinations like 


2 6440 ta +e Zits +12 2 Bid; Pt 


t=1 147 t<j 
k 4 
+ 24 z PiPjPiPm = (> rs) : 
i=1 


i<jcl<m 


However, some of the terms resulting from multiplying by 1/p; above cannot 
be condensed in this fashion until they have been reduced to familiar sums by 
using relationships of the following type: 


E vi(2 + ) (k-1) Dm, 


i<j Pi 


1 
2. pipipi(— + L +1) (k — 2) Do pip,. 


i<j<l Pj Pi i<j 


After all possible condensations have been made, (9) reduces to 


1 7173 1 . 
~—1) -| ))— — (kh + 4k +1) 
6 8 t=1 Di 


1 3 ] k+1 1 . 
_ —|5)>,— — (3k + 12k 
+(5 1) [5 24> ee +9)| 


As a result, the generating function of x” can be written as 


G(t) = gt 4 = (g het) — 2g ih+2) 4 oy 


(10) +2 ~ (0 3(k+6) _ gg iets) + Bq 1e+2) em 


+ (terms involving higher powers of 1/n), 


k+1 


k+1 


where S; = 3 = — (kh? + 4k + 1) Jana S, = De 2» — (3k? + 12k + 5). 


= 


5. The Distribution Function of x°. It is well known that @** = (1 — 2t)™* 
is the generating function of what is commonly called the x’ distribution func- 
tion with k degrees of freedom. If this distribution function is denoted by 
F,(x’), then the distribution function corresponding to (10) can be written as 


S 7 7 S on on 7 
Fy + — (Fass — 2F ese 2+ Fy) + — (Fi, 6 — 8F igs + 3Fix2 — Fx) 


(11) 


+ (terms involving higher powers of 1/n). 













CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES 165 






The customary test for goodness of fit involves the integral of F;(x°) from 
x to ©, which has been tabled for values of x’ and k. The form of (11) is such 
that the integral of the term in 1/n is easily evaluated by means of this same 
table. However, for more accurate results and for theoretical reasons, it is 
more elucidating to express these integrals in a more compact form. This is 
accomplished by using familiar’ expansions for the integral of F(x’). De- 
noting the integral of the explicit terms of (11) by P, it is easy to show that 







1 Y Y 
(12) Pr = P, aa a [Ry S; + R282], 





where P, is the customary tabled value for k degrees of freedom and 
R en x ->_(k+2 
(18) 


—ix? _k 
Ry = 








caitiglcecaeailliiiimis 
24-.-(k +4) 
for k even, while for k odd both R; and Rz contain an additional factor of +/2/x 
and have 1.3 --- (k + 2) and 1.3 --- (k + 4) respectively for denominators. 


[xi — 2(k + 4)x? + (k + 4)(k + 2)], 












6. Conclusions. In any given problem the second approximation P can be 
calculated easily by means of either (11) or (12) and compared with the cus- 
tomary first approximation P,. However, the magnitude of this correction 
term is of primary interest when x’ is near a significance level and when one 
or more of n, k, and p; is small because the accuracy of P; is questioned in 
those cases. 

For x’ at the .05 level and for 2 < k < 16, it is easily shown that 0 < R, 
< .08 and —.08 < R. < 0. Clearly S_ is positive, while S; will be positive if 
one or more of the p; is sufficiently small. Consequently, for those cases of 
particular interest, the correction term is surprisingly small partly because 
R, and Rz are so small and partly because they are of opposite sign. 

To illustrate this viewpoint consider the following numerical example. Let 




























), n = 10, ik - 4,x° * 9.488, p, =p= Zo; Ps = so, Ps = 30, Ps = Bo: Then 
S; = 2.23, S. = 6.38, R, = .056, R. = —.027, P; = .05, and P = .045. The 
correction term of —.005 is very small in spite of the fact that this example is an 
extreme case to which the customary x° test would not be applied. 

As judged by the second order approximation obtained in this paper, the 
tk actual error comitted by using the customary first approximation is much 
e- smaller than the order of the neglected terms would indicate, and therefore 
vy the range of applicability of P; is wider than has been supposed. However, 
3 this investigation has considered only the error due to rough approximations 

and leaves untouched the second type of error indicated in the introductory 
paragraph. 
OREGON STATE COLLEGE 
,). —_—__—__—__—_ 


6 Risser and Traynard, loc. cit., p. 251. 





SHORTEST AVERAGE CONFIDENCE INTERVALS 
FROM LARGE SAMPLES 


By 8S. S. WiLKs 


1. Introduction. The method of fiducial argument [1, 2] in statistics has 
gained considerable prominence within the last few years as a method of inferring 
the values of population parameters from samples “randomly drawn”’ from 
populations having distribution laws of known functional forms. The method 
has also been shown to be applicable [2] to the problem of inferring the values of 
statistical functions in samples from samples already observed, assuming all 
samples to be drawn from a population with a distribution law of a given func- 
tional form. 

The main ideas of a procedure which is sufficient for carrying out fiducial 
inference for certain cases of a single population parameter may be summed up 
in the following steps: 

(a) A sample is assumed to he “randomly drawn" from a population with a 

distribution law f(x, @) of known functional form. 

(b) A function ¥(a,, 22, --- ,2n, 0) of the sample values 7, , x2, --- , X, and 
parameter @ is devised, which is a monotonic function of @ for a given 
sample, so that the sampling distribution G(W)dy of W(a,, x2, +--+ , Ln, 9%) 
= y, say, in samples from the population with @ = 4 is independent of 
6 and the x’s, except as they enter into y. 

For a given probability a a pair of numbers y, and W, is chosen so that 
when 6 = 0, the probability that ~, < wo < Wz is 1 — a, or more, briefly, 


PW. <Ww <W.|0=%) =1l—e 
which can be stated in the alternative form 
P(0 << % <0\|0 = &) =1—a. 


6 and @ being functions of Wi ; v. and the sample, are subject to sampling 
fluctuations and it can be Stated that the probability is 1 — a@ that they 
will include the true value of 9, whatever it may be, that is, 0 , between 
them. The statement holds for all values which 0) may take on. 

The numbers @ and @ are known as fiducial or confidence limits [3] of 09 and 
(0, 6) a confidence interval for the confidence coefficient 1 — a. We therefore 
have the following rule for making inferences about the unknown number 6 
once y has been found: For a given sample solve the equations 


V(x ¢ aBy *** 9 Bay 0) = vi ) W(x, o RRs. *** 4 Ras 60) = vi. 


166 





SHORTEST AVERAGE CONFIDENCE INTERVALS 167 


for 0%. Let 6 and 6 be the two values of 4 formally obtained. The statement 
that @ and @ will include the value of @ in the population actually sampled, if 
consistently made in each of an aggregate of cases involving populations having 
distributions of the same functional form f(z, 6) will be correct (theoretically) 
in 100(1 — a) per cent of the cases. 

If y is a function of statistics t; and f& of two samples from a population of 
known functional form, which is monotonic in each ¢ for given values of the 
other, then one can argue fiducially about values of one ¢ from values of the 
other one. 

For a finite value of n and discrete distributions f(z, 0), it is not possible to 
carry through steps (b), (c), (d) as they are now stated. However, under 
certain conditions, it is possible to carry out a procedure for the discrete case 
which will allow one to say 


(3) P06 <% <6|0= %) >1—-— a. 


y functions which have the property that their sampling distributions are 
independent of @ and the 2’s for a given distribution f(x, 6) are not, in general, 
unique. The question then arises as to how (if possible) one can choose y 
functions and limits y;, and ¥Z so as to get confidence intervals for a given a, 
which are shortest or “‘best’’ in some sense. Neyman [4] has investigated the 
problem of obtaining “best” confidence intervals for the case of small samples. 
The object of this paper is to consider the problem for large samples. Under 
fairly general conditions it will be seen that a rather simple asymptotic solution 
exists for the large-sample case, which is connected in an essential manner with 
the method of maximum likelihood. 


2. An asymptotic distribution. Suppose a population II has a distribution 
function f(z, @), where x is a random variable and 6 a parameter. Actually, 
f(x, 6) may involve several other parameters whose values may be regarded 
as fixed throughout the paper. The problem of arguing fiducially about several 
parameters simultaneously will not be considered in this paper. In order to 
include the case of a discrete as well as a continuous variate x, we shall consider 
the cumulative distribution function (c.d.f.) F(a, 6), which is monotonic and is 
such that 


F(— ~, —)=0, F(+,0)=1, F(x+e, 6) > F(z, 6) 
F(a + 0, 0) = F(a, 6), fore > 0, anda < 6 <b. 


Thus, F(x’, 6) = P(x < 2x’ | 6). In the case of a continuous variate z, where 
f(x, 0) is a probability density function, then dF (x, 6) = f(x, 6) dx; in the discrete 
case dF (x, 0) = f(x, 0) where f(x, 6) is the probability that variate x takes the 
value indicated. We shall be interested in continuous functions g(x) for which 
the integral f g(x) dF(z, 6) taken in the Stieljtes sense exists. Limits on integral 
signs are understood to be — « and «. 





168 S. S. WILKS 


Now consider a sample 0, of n individuals independently drawn from Ih , 
the population for which the e.d.f. is F(x, 0). Let the values of x in the sample 
be 2,2%2,---,2n. The probability element associated with the sample is 


(4) dP, os II dF (x;, 6). 
i=l 


Let L = logdP,. Then assuming that < log d F(x, 6) = g(a, 0), say, exists for 


6 = @ , and for each z, (except for a set of probability 0), we have 
aL S 

(5) — = D1 g(zi, 6). 
00 i=1 


In all ordinary problems in statistics g(x, @) reduces to = jf f(x, 0) where 


f(x, 9) is probability in the case where x is a discrete random variable and 
probability density in the case of a continuous random variable. Let go denote 


g(x, %) and (7). denote - with 6 = @%. Let 


(6) A? = Ey [(q)?] = | gi dF (x, 6). 


E,(¢) will be used to denote the mathematical expectation of ¢g in samples from 
IIp , i.e. when the population distribution is dF (x, 4). We shall consider the 
sampling theory of 


(7) yo = (%) 


V nA 0 


in large samples, from Ip . 
Let 9" ’(t) be the characteristic function of Yo for samples from Ip ; it is 
defined by E,(e"”). Then 


rf . tg. \]\" 
go (t) = {| exp( 2.) | 


(8) t 22 1 ‘ 
= {| 1 + - a “is (¢: + itn) |} 


Vn As 


where ¢; and ¢ are real functions-of t, x, n and 4, such that if | Eo(go)’| < K <@, 
(i.e. the third moment of go is finite when @ = 6), fora < 0 < b, then Enlgod] 
and E,|god2] are uniformly bounded for some t-interval 6(which includes ¢ = 0 
as an interior point) for n larger than some m and for 4 on any fixed subinterval 
of the interval (a, b). Suppose that F(x, @) is such that 


| id OE 


This condition implies that the range of x be independent of 8. 




























SHORTEST AVERAGE CONFIDENCE INTERVALS 169 





If n is allowed to increase indefinitely, then we have at once that 95” (¢) tends to 
e?*” uniformly in the interval 6. We now make use of a theorem [5] which 
states that if an unlimited sequence of random variables x“, 2®, ..- , 2 -. 
with ¢.d.f.’s F(x), F(x), --- , F(x) --- have mealion characteristic 
functions ¢'” (t), g(t) --- ¢' Dt) --+ then a necessary and sufficient condition 
for F‘ (x) to converge uniformly to ac.d.f. F(x) at each point of continuity of 
F(z) on the interval (— ~, ~) is that the sequence of characteristic functions 
converge uniformly to a function ¢,(¢) on an interval |t| < ¢ for some e > 0. 
The characteristic function g(t) associated with F(x) will then be identical with 
yi(t) and the sequence ¢'”(#), --- g(t) --- converges to g(t) uniformly in every 
finite ¢-interval. 

From this theorem it follows at once that, since e* is the characteristic 
function of a variate distributed normally with mean 0 and variance 1, the 
asymptotic c.d.f. of Yo for large samples is given by 
vo 
(10) F(Yo) = ss on de. 

We may conveniently summarize the foregoing results in the following 

THEOREM 1. Let 2, %2,--+ , Xn be the values of x in a sample of independ- 
ently drawn individuals from a population My which has a c.d.f. F(x, 0), such that 
fora < & < b, 


(a) < dF (x, 0) exists for all x’s except possibly for a set of probability 0; 


(b) Eo{(go)] ts finite; forn > no, 
(c) condition (9) is satisfied. 
Then the asymptotic c.d.f. of Yo for large samples defined in (7) ts given by (10). 

The statistical significance of this Theorem is that if we know the functional 
form f(x, 6) (for which the first derivative f’(z, 6) with respect to 6 exists) of the 
distribution function of a population II and if the sample 2, x2, ---, 2, is 
“randomly drawn” from Tp , then the quantity 


> eee (z;, 6) 

(11) bo = Se) 

vev al Go) 

f(z, a) 

is a random variable which is approximately normally distributed with mean 0 
and variance 1 in repeated large samples. It will be noticed that the quantity 
in the numerator of (11), is simply the derivative with respect to 0, at 6 = 0, 
of the logarithm of the likelihood of @ for the given sample. yo is a function of 
the sample 0, and the true value 4 of the parameter 6, and the thing that makes 
% a random variable is the random nature of the sample; 4 is a fixed but un- 


known number. Thus, for example, when 1 — a@ = .95 in (1) and knowing 
that we have ‘‘randomly drawn”’ a large sample 0, from a population Ty with 















170 Ss. S. WILKS 


distribution f(z, 60) of known functional form, we can say that the probability is 
.95 that the sample will produce a value of Y in the interval --1.96 to +1.96 
that is, 


This statement holds, whatever may be the value of the unknown 6. Now, 
the inequality —1.96 < Yo < 1.96 is equivalent to the inequality, @ < % < 6 
because of the monotonic nature of Yo as a function of 6. Hence (12) is equiva- 
lent to 


(13) P< 0 < 6/6 = %) = 95 


where 6 and @ are obtained by solving Y = + 1.96 for 0). The fiducial limits 
@ and 6 will thus be functions of the sample and will be subject to sampling 
variations. In general, of course, one could choose any probability level 1 — a, 
and find y. so that 


(14) P(—Ya < ho < Wa] @ 


from which fiducial limits for 6) can be found as before. 

The extension of Theorem 1 to the case in which the distribution function of 
the population II involves several parameters 6, , 62, --- 0, having values in 
some region R of the space of 6’s, isimmediate. Ip in this case would be specified 
by the values 410 , 620, --- 90. In fact, we can state the situation as 

THEOREM 1’: Let F(x, 6; , 62, --- , 0,) denote the c.d f. of x and (allowing i, j, k 
to take on values 1, 2, --- , h) let 


aL : 
Yio = . (G ) where L = >, log dF (x1, 910, 820, ++ 5 Ono), 
0 j=l 


v/n \ 06; 


q; = . llog dF(x, 0, --- , %)I, 
06; 


Ai; = Evlgig jo] where gio = gi with 0; = On. 
If, in R, 
(a) < dF (x, 0,, ---, 0) exists for all x’s except possibly for a set of probability 0; 
(b) Eo(giog jogu0) are all finite; 
(ec) =| dF (x, 01, 02, +--+, 6) = | og ta 0,,02, ---,0.) =0 


i 

(d) || Ai|| 7s non-singular ; 
then the asymptotic distribution of the pio in large samples from Wp is a normal 
multivariate distribution with matrix || A;; | of variances and covariances, and zero 
means. 

A similar theorem holds for the case in which TI is a multivariate population in 
addition to heving several parameters. 

The question now arises: In what sense is the confidence interval between 6 and 





of 
in 
“dd 


nal 
ero 


) in 


ind 


SHORTEST AVERAGE CONFIDENCE INTERVALS 171 


6 as determined from yo “‘best’’? It will be shown that the average rate of 
change of yp with respect to @ at 6 = 4 is greater than that for a rather broad 
class of functions of the y type, that is functions of the observations and 6 which 
are asymptotically normally distributed. Since we are dealing with large 
samples, we are only interested in values of @ in the neighborhood of 4, for 
which y as a function of @ is approximately linear, and demonstrating the prop- 
erty just stated regarding the average rate of change of y with respect to 6 at 
§ = 6 is equivalent to showing that the two “‘values’’ of 6) for which Yo = +Wa, 
will, on the average be closer together than those computed from any other y 
function than yo of the class of functions to be considered. This class of func- 


tions will be designated as belonging to class C, and will now be more accurately 
defined. 


3. Functions of class C and their asymptotic distributions. Following an 
argument similar to that used in proving Theorem 1, we can readily prove 
THEOREM 2: Let h(x, 0) be a function in which x has the c.df. F(x, 0), and 
which satisfies the following conditions fora < @% < b: 
(15) (a) Eo{h(a, 4)] = 0; 
(b) Eol{{h(x, 9)}*] is finite, for n > no. 


Let 
(16) Ao = Eo [{h(2, 6)}"] 
and for a sample of values x, , %2, +++ , Ln let 
h(xi, @ 
(17) it = % Men) 


J/nAr 


Then the asymptotic c.d.f. of Wo for large samples from Mp is given by 


ryt) =. [“ ea 
Yo) = we i. € z. 

We shall designate as belonging to class C any function yo made up according 
to the rule expressed by (17), of functions h(x, @) satisfying (a) and (b) in (15) 
and such that Wo is asymptotically normally distributed with zero mean and 
unit number. Clearly, Yo as defined by (7) belongs to class C. 


4. Comparison of average confidence intervals computed from yp and Yo . 
We shall now show that for each fixed value 4 of 6 the average rate of change of 
y with respect to @ is greater than that of y* for any A(z, @) which is not a con- 


eg OW OY" ; 

stant multiple of g(a, @). Consider a and -y foragivenn. We have, 
ay 1 {agra 1 aA\ 

1 = oe ae iia 

us) a0 nA ‘= a0 BX ores 55 f 


ay* “~ Oh(z:,0) 1 y,7 _ 
i 00 nee 00 ga 2 h(a, 6) 30 





Ss. S. WILKS 


ag(x:, 4) _ a = Ml fares, ay = appl Ol _ tgce,, op. 
” SUF ano 


06 00 


Assuming that 


e 
a2 
0 


- af ; 
nnmmniaia 1 ” a ” = = 
(20) | api F(a, 0) =| dF(x, 6) = 0 


and remembering that E,[g(x: , %)] = 0, we have 


(21) Ba| (%) | ihe 


and 


> efed-welen = 
0@ Jo Ax 00 0 


Now, since 


(23) [re 6) dF(x, 6) = 0 


and assuming that (23) can be differentiated under the integral sign, we have 


, | (ah(z, 8) _ 6 i a 
(24) Eo ( = ), | = BEG @) teed (x, o| 


* ° 2 2: : ° 
For the difference Aj — A: in samples from populations with 6 = 6, we have 


( (; IF ) 

— dF (x, 6) j 
_" 4 [a0 ©’ ” ) dF(, 4) _ 
(a3 | | dF(z, 6) | er 


- < dF (e, 6) ‘| 
‘ | (R(x, 6) Va Ce, ) \ aR) | Seay 


(25) 


Making use of Schwartz’ inequality which states that 


| g (x) dx - | h* (x) dx > | g(x) h(x) a], 


where the equality sign holds only if g(x) = K h(x), K being a constant, it is 
evident that independently of n, Aj > Az, and furthermore, the only condition 
under which Aj = Aj is that 

9 OF (2, 6) 

00 . 


h(x, 0) ~/dF(z, @) = * Je 0)’ 






























SHORTEST AVERAGE CONFIDENCE INTERVALS 


that is, 
(26) h(x, 0) = K g(z, 8). 
Therefore we have 

THEOREM 3. If g(x, 0) and h(a, 0) satisfy the conditions of Theorems 1 and 2 
respectively and furthermore, if (20) is satisfied and if the expression on the left in 
(23) can be differentiated under the integral sign with respect to 0, then the average 
rate of change of y with respect to 6 for each fixed value 6 of 6 is greater than that of 
y* (for which h(x, 0) # Kg(a, 0) with respect to 0, when @ = 6) insamples from Mp . 

This Theorem simply means that when computed from yo the fiducial limits 
for the true but unknown value 4% of the parameter 6, whatever value 0) may 
have on the interval a < 6 < b of possible values, are (for large samples) closer 
together on the average than those computed from any other vo of class C. 
There is no function ¥ which is more efficient, as it were, for determining con- 
fidence intervals for 4) than the particular yo given by (7) which is yo with 





h(x, 0) replaced by g(x, @), that is, , log dF (xz, 0). The actual manner in which 


the fiducial limits for 4) are found for a given confidence coefficient 1 — a, is 
to set 

G) 
(27) 06 Jo _ 





, 2 Pa 

and solve formally, for 6), where ¥. is the value for which —j— e™ dz=a, 
V 2m Jee 

which can be found from normal probability tables. The two values of 4 thus 


found are the fiducial limits @ and @ for the true value 4 and we can state that 
the probability is 1 — a that 6 and @ will include the true value 6 between them. 
This statement is valid whatever may be the value of 6) between a and b. This 
rule consistently followed for large samples, will produce fiducial limits @ and 6 
which are closest together on the average, for each fixed value of the probability 
level a between 0 and 1. It should be observed that no assumptions have been 
made regarding the existence of sufficient statistics. 












5. Examples. ExampLe 1. Suppose a large sample of n individuals to be 
drawn from a population known to have the Poisson distribution law 


m "Tes 


S(z,m) = - x 







We have 


L 


—log (II i ) + (2 2;) log m — nm 
1 
(24) 22: 
—) = —-n 
am /o mo 
2 2 
Ai = EB, | (282) | = (2 wi 1) | ae 
om /o Mo Mo 





174 Ss. S. WILKS 


The fiducial limits m and m for a = .05, that is, the 95 per cent fiducial limits, 
are found by formally solving the equations 


(2) (2 - av 
a = 41.96 
Vn Ao Vn 


for m. The fiducial limits are found to be 


( 
z+! Oa 4/24 8 = 


EXAMPLE 2. Consider a large sample of n individuals known to be from a 
binomial population having the two classes A and B. Let p denote the proba- 
bility of an individual’s belonging to A, and q = l — p that of belonging to B. 
Let 2 denote the number of individuals belonging to A in one drawing from the 
population; x will take on only two possible values, 1 and 0, with probabilities 
p and q respectively. The population distribution is thus 


f(x, p) = pl — p)*”. 
We have 


1 = (22;) log p + Z(1 — 2z,) log (1 — p) 


(=) - _m n-mM_ mM— NPy 
ap po 1— po po(l — po) 


where m is the number of individuals belonging to A in the sample. Further- 


more 
a x l-—@z P a a sai 
Ab = Ba] (2 — 12) | = two - mo 


95 per cent fiducial limits for po are got by solving the following equation for po 
Mm — MPo 


eames: iE. 
V/nvV/ pol — Do) 


It will be seen, that situations, such as frequently occur in genetics, where p 
may be a function of some other parameter 0, say p = u(@), can be handled by 
simply replacing po by u(@) and solving for 4 . 

ExaMPLE 3. Let the form: of the distribution function be 6e*, where 
0 <2< «. Fora sample of individuals, 


L = n log @ — 622; 


©) 


Aj 





SHORTEST AVERAGE CONFIDENCE INTERVALS 


The 95 per cent fiducial limits @ and 6 are given by solving the equations 


nm 
OT 
Vn (1/60) 


; 
— >a; 


= +1.96 


for . We get 
iam 1.96//n. aa A 1.96/+/n 


E 
where < is the mean of the sample. 
PRINCETON UNIVERSITY. 


REFERENCES 


[1] R. A. Fisumr, ‘“The Concepts of Inverse Probability and Fiducial Probability Referring 
to Unknown Parameters,’’ Proc. Royal Society of London, Series A, vol. 139 
(1933), pp. 343-348. 

(2) R. A. Fisuer, ‘‘The Fiducial Argument in Statistical Inference,’’ Annals of Eugenics, 
vol. 6 (1935), pp. 391-398. 

[3] J. Nevman, ‘“‘On the Two Different Aspects of the Representative method: the Method 
of Stratified Sampling and the Method of Purposive Selection,’’ Royal Statistical 
Society, vol. 97, 1934, pp. 558-625. 

[4] J. Neyman, “Outline of a Theory of Statistical Estimation Based on the Classical 
Theory of Probability,’’ Phil. Trans. Roy. Soc. London, Series A, vol. 236 (1937), 
pp. 333-380. 

[5] H. Cramtér, Random Variables and Probability Distributions, Cambridge Tracts in 
Mathematics and Mathematical Physics, No. 36, Cambridge Universitv Press, 
1937. 





TRANSFORMATIONS OF THE PEARSON TYPE III DISTRIBUTION 
By A. C. OLSHEN 
I. INTRODUCTORY 


Transformations of the normal curve have been used as a basis for the repre- 
sentation of skew frequency distributions by Edgeworth, Kapteyn, Van Uven, 
Bernstein, and others. Various studies have been made of the distributions 
obtained by replacing each of a set of normally distributed variates by a loga- 
rithmic function of the variates. Among the earlier investigators along this 
line were Galton, and McAllister; later, works by Jorgensen, Fisher, Wicksell, 
Davies, and a more recent study by Pae-Tsi-Yuan, were added. 

Rietz’ restated and treated, in a general fashion, the question as to the proper- 
ties of the distribution of powers of a set of variates which are known to be 
normally distributed. By a suitable choice for the origin of the normal curve, 
he obtained results which are applicable in answering questions which frequently 
arise in the applied field concerning the properties of families of interrelated 
distributions, one strain of which is known to be normally distributed. For 
example, in the family made up of the diameters, surface areas, volumes, etc. of 
some physical quantity, if it were known that one set, the surface areas for 
instance, were distributed normally, then from his results we have the properties 
of the distributions of any of the other sets. 

Likewise it has seemed of interest to investigate, in a similar fashion, the 
properties of the transformed Type III Pearson distribution. We shall treat 
both the power and logarithmic transformations. For instance, if we knew 
that any one of the physical measurements, velocity, kinetic energy, momentum, 
or centrifugal foree (all of which are functions of the velocity) were distributed 
according to a Type III curve, then we raise the question as to the properties 
of the distributions of any of the others. Similarly, if the intensity of certain 
light, J, were known to be distributed according to a Type III law, we will 
discuss the properties of the distribution of the brightness, B, of the light as 
seen by the eye, since the two are known to be related by the law B = K log J. 
The same analysis applies to the relationship between L, the loudness of a 
sound, and E, the energy in the sound wave, since L = K log E. 

Two forms of the Type III distribution will be considered. In the first form, 
all the variates are taken positive; in the second form, the origin is at the mean 
and the variates are measured in units of standard deviation. 


1H. L. Rietz, Frequency Distributions Obtained By Certain Transformations of Nor- 
mally Distributed Variates, Annals of Math., Vol. 23, (1922) pp. 291-300. 


176 

















TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 177 








In the last section, a transformation is developed which will transform the 


ordinates of a given probability function into the ordinates of the normal curve, 
—t{2 


y = Ce?, to within certain approximations. This transformation is applied 
to the Type III distribution and to the distribution obtained under power 
transformations of variates of the Type III distribution. 


II. Power TRANSFORMATIONS 





a. Type III curve with all variates positive. 
Given the Pearson Type III law, 


(1) 


where 


dea 
y=yr™ e€™, 0 


lA 


xr< @, 















2 yz _ - 1 2 
(2) Y= = >e, — => ’ Imo. = T— -. 
Ms (yz) Y Y 


The probability function (1) is a single-valued, real-valued, non-negative, 
CO 


continuous function of x with ydx=1. The probability that a variate 
0 
chosen at random will fall into the interval a; to ae is given by 
(3) P= I y dx. 
a 


Let us make a transformation by replacing each variate x by x’, where zx’ = 2", 
and n is a real number on which restrictions will be placed as we proceed. When 
n is such that x’ may have more than one value corresponding to an assigned 


. . . . —] 
value of x we shall consider only the principal value of 2’. Then dz’ = nz” 
/ 


| 
except at x = 0 when n < 1, and dx = ie except at 2’ = 0 when n > 1, or 


dz, 








nx’ ® 
n<0. 
The frequency function of the x’ variates is given by 
vz : 


() fla’) = Wate em" 


’ 


which does not represent a Type III curve when n # 1. The function (4) is 


discontinuous at x’ = Oif 7 <1. Likewise, corresponding to (3) we have 
n 


n z 1 
2 Yo on roo —yx'% , 
(5) P==> z’" e dx’. 
Qa 













n 


2 The expression Zmo. represents the mode, and rma. represents the median. 


178 A. C. OLSHEN 


In order to study the maxima and minima points of (4) we take the derivative 


dz’ 


(6) ee fle\na!"\ —y2 + yz — n} 


The derivative changes signs at 


7) ( 


Thus, variates in an interval, dz’, at the mode of the new distribution (4) came 
: . ; ; — fei 
by transformation, from an interval in the neighborhood of x = < — -, which is 
" 


to the left of the mode of (1) whenn > 1. The function (4) will be a monotone 
decreasing or a unimodal continuous distribution with mode given by (7) 


: 1 ‘ n 
according as Z is equal to or is greater than —. 


It will prove convenient to discuss the properties of (4) under three headings, 
according asn > 1,0 < n < 1, and n < 0, where n or its reciprocal is an integer. 
CaseI. n> 1. 


When < < -, (4) is a monotone decreasing function, infinite at the origin and 
" 


asymptotic to the .’ axis; in this case the distribution of 2’ is similar to the 
distribution arising in the corresponding transformation of a set of normally 
distributed’ variates, when @ < 4(n — 1), where Z is the arithmetic mean of the 
x’s of the normal curve. However, we are primarily interested in the case 


-  * . a ; 
when @ > -, under which condition a mode exists on the frequency curve f(z’) 


and is given by Luo, (2 ~ =) . Henceforth in discussing the comparative 
yg 


values of the measures of central tendency, it will be assumed that the condition 


> ” is satisfied. We have, 
7 


(2-*) <(2-2), where t, «2-1. 
v 7 Y 


Thus, while variates at the modal value of xz in the Type III distribution trans- 


form into z’ = (: — 2 , the mode of the new distribution is at tno, = (: = ) 
bf 


© 
which, when n > 1, is to the left of the positions to which variates at the mode 


of the Type III distribution were transformed. Furthermore, as n increases, 
Tmo. approaches the origin. 


3 Cf. Rietz, loc. cit. p. 296. 


































TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 179 


The arithmetic mean of the x’’s distributed in accord with the function (4) is 





given by 
=! ' Yo , pe _- ’ 
= py = — z’* € dx 
(8) n Jo 
_ Tat +n) 
y"T(yz) © 


Similarly the sth moment about the origin is 


» Tyé+sn) (ye + sn— 1)™ 
(9) he =, 
y*T (yz) y"* 


Pvt + n) - (: + : =) (: + ma *) --+ (2), 


y"T(y2) 
which is greater than (%)", hence #’ > £". Thus, while variates at the mean 


value of z in (1) transform into (%)”", the mean of (4) is at ae which is to 
y" ly 

the right of the positions to which variates at the mean of (1) were transformed. 

We have 





But, 


(10) : z ye 
: : 


hence 
(11) 


In 1895, Karl Pearson* showed that the median of the Type III curve was 
approximately two-thirds of the distance from the mode to the mean, and later 
Doodson’ gave similar results. The analysis of (1) along this line is given in 





. ; a 1 
Section IV. However, since %mo. =  — —, we may take %mg. =  — — where 
c 


c > 1, (approximately equal to 3). Then 2na. = (2 — 5) . Wehave 
av 


(12) (2 a =) < (z - sy < (@)", 
7 CY 


hence from (10) 


* Karl Pearson, Skew Variation in Homogeneous Material, Philosophical Transactions, 
Vol. 186A, part 1, (1895) pp. 343-414. 

5 Arthur T. Doodson, Relation of Mode, Median and Mean in Frequency Curves, Bio- 
metrika, Vol. 11, (1917) p. 425. 


180 A. C. OLSHEN 


Considering the case when n = 2, we have with the aid of (9), 
’ 


py = BOVE + re + 15)" 
(yk) (y + 1)(27vz + 3)” 


By = BANE + T2V°R + 3377 E + 629yE + 420) 
. (y£)(vy + 1)(2yz + 3)? 


From the moments of (1), one readily gets 


i = a 
yx 
_ 37 + 2) 


2B 2 a 
yr 


It can be shown easily that .8, > 28, and 2Be > Be ; hence the distribution of 
the squares of variates is more leptokurtic and more skew than the original 
distribution. 

From (10) and (12) it is evident that the mode approaches neither the median 
nor the mean as n increases, subject to the condition y¥ > n. Each of the 
ratios of the mode to the median and to the mean approaches the limit 1 as Z is 
increased indefinitely, the rapidity of approach to the limiting value depending 
on the size of n. 

Taking the second derivative to find the points of inflection of the function 
(4), we have 


d’ f(x’) 


dx? 


2 1 
f(z')(nzx’)* ca + yx (38n — 2é —1) + WV # — Bnyé + an’) 


When the points of inflection exist they are given by 


(14) wt _ Gye — 3n + 1) Vn? — bn +14 478 
- ¥ | 





Under the restriction that yi > n, the expression under the radical in (14) 
cannot vanish, and will always be positive. 
Case lI. 0<n <1. 

We now consider the distribution obtained by taking positive integral roots 
of a set of variates distributed in accord with (1). The mode of f(z’), as given by 
(7), will always exist since from (2), y¥ > 1 >n. Wehave 


(2-2) ada<3) tam *, 
Y cY Cc 
(z-! # ‘) je <*, 

Y cy c 


=) ifn = a 
cy ¢ 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 181 


. . . / . , . 

Hence it is evident that 2,,.. is less than, greater than, or equal to 2nq. according 
as n is greater than, less than, or equal to 1/c. In power transformations of 
symmetrical distributions’ the results differ in that the modal value is always 


greater or less than the median according as the value of » lies between 0 and 1 
or outside of these bounds. 


- n n : 1 n - 
Here, (: — ) > (: — ) , hence in contrast to Case I, the mode of the new 
T © 


distribution is to the right of the position to which variates at the mode of the 
Type III distribution were transformed. 

It has been shown’ for every set of positive values that u/, > (#)” when n lies 
outside of the interval 0 to 1, and that np, < ()” when n lies between 0 and 1. 
We have then 


(yi + n) , / -\n 
16 = o'fl1 = stn < (2). 


The mean of the new distribution is to the left of the position to which the 
variates at the mean of the Type III distribution were transformed. 

In Section IV, with the aid of certain approximating assumptions, it will be 
shown that 2’ > tog. When tog. > Too. and conversely, %’ < tq. When Lend. < Leno 


Case III. n < 0. 


Let n = —m, where m is a positive integer. Then we have 


vz, im 
(17) f(z’) = Yo a (S*) et m f(x’) — 0 


at 
m , 


In place of (6) we have 
/ m+1\—1 1 
(18) = = f(x’) ( ma’ ) {a8 +m)2’" —y¥y 
and (7) becomes 
(19) 


But 


and 


ee )-e-yO 


6 H. L. Rietz, On Certain Properties of Frequency Distributions, Proc. National Acad- 
emy of Science, Vol. 13, (1927) p. 820. 


7J. L. W. V. Jensen, On Convex Functions and the Inequalities between the Means, 
Acta Mathematica, Vol. 30, pp. 175-193. 


asa eas 88 ee 


) 
> 


af 8 


2 


BIGGS FT nF B02 2 





182 A. C. OLSHEN 


Hence, as in Case I, the mode of (17) is to the left of the position to which 
variates at the mode of (1) were transformed while the mean of (17) is to the 
right of the position to which the variates at the mean of (1) were transformed. 
Since 


/ / 
we have 2m. < 2ma.. Also 
1 


(: =) (: m—i1 
f—.—}iz— —— 
t si 

hence #’ > 2ma.. Therefore 


(20) Tino. < Lind. < i 


As a special case, when n = —1, (17) reduces to 


= ae 
(21) f(z’) ae nee e z 


which is a Pearson Type V distribution. 

b. Type III curve with mean zero and unit variance. 

Even though the form of the Type III distribution with which we have been 
dealing, wherein all the variates are positive, is more closely akin to actual 
distributions that may arise in applied problems, nevertheless it will be of interest 
to examine the properties of the transformed curve when the mean is taken as 
zero, with unit variance. 

The second and third moments about the mean of the distribution (1) are 


3 


2x . , Me 
ze = —, 23 = —~. If we write a; for the third standard moment 7, then 
: a Me 


(22) 


(24) 


where 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 183 


Equation (22) lends itself to a simple interpretation of the restriction made in 
Section Ila, Case I, that yz > n in order that the mode of (4) exist and be given 
by (7). The upper bound in the values of a3 considered by Salvosa® in the 
computation of his tables was a3 = 1.1. Upon examination of the tables it is 
obvious that in most cases the skewness of the Type III distribution, as measured 
by as, will be less than 1.1. Hence in most cases we will have 3.22 < yi < o. 


The effect of the limitations imposed by the condition yZ > n may be inferred to 
some extent from the following table. 


TABLE I 
The upper bound of az for the existence of a mode in Case I, Section Ila 
n | 2 


St icalillass 


| ahd cash Baas 


3/ 4/5 | 6 | s 9 | 10 | 25 | 50 | 100 


as | 1.41] 1.15) 1.00, .89 | .82| .76) .71 | .67 | .63| 40) .28 |) .20 


When we make a transformation by replacing each variate ¢ in (24) by ¢t’ 

where t’ = t” (n ¥ 0) and n is an integer (positive or negative) or the reciprocal 
: we dt’ 

of an integer, then dt’ = nt” dt, except at t = 0 when n < 1, anddt = —; 
nt’ * 


ex- 


cept at t’ = Owhenn > 1l,orn <0. The function (24) becomes 


4 


( in SV Bed 
1+ “8 ys) e % 
(25) , f(t’) =” oul stint 


nl 
i 
The distribution function, f(t’), is infinite at ’ = 0 when n > 1. 


In place of 
(5), we have 


(26) 


Here a; and a are taken to be positive or zero when n is even. When n is odd a, 
and a2 may be taken negative as (25) will give the frequency curve for negative 
values of ¢’ that arise from setting t’ = t” when t is negative. 


Examining for 
maxima and minima points, we have 


1 


(27) _ = —f(t’') {ne(1 + sys) \ \e ~ = + (n — D} 


§ Luis R. Salvosa, Tables of Pearson’s Type III Function, Annals of Mathematical 
Statistics, Vol. 1, (1930) pp. 191-198. 


s7ceura ee wer fF 


wt 


2 


L2E TS F wee BV? = 





184 A. C. OLSHEN 


The derivative changes signs at 


(28) eno he "A a) ~ A(n — ) 
when ("5") > 4(n — 1), and at t’ = O for certain values of n. When n > 1, 


2 
and positive, the derivative changes signs at t’ = 0. If n is the reciprocal of an 
odd positive integer greater than one, there is a minimum at t’ = 0, and the 
function f(t’) is zero at this point. Further properties of the frequency curves 
given by (25) will be discussed under the three cases treated in Section IIa. 
CaseI. n> 1 


9 
na3 \- ; : ; ‘ 
( ) > 4(n — 1) only for very large values of n since a3 < 2. When n is odd 


. nas . ‘ ° e i 
When (=) > 4(n — 1), it can be shown that (28) gives neither a maximum 
1 
nor minimum point of f(t’) since t’/” will always be less than t.” Similarly, 


na; \" ° ° - je ° 
when ( 9 < 4(n — 1), there is neither a maximum nor minimum since (28) 


is imaginary. When n is odd, f(t’) is infinite at the origin and is a monotone 
increasing function of ¢’ from the lower bound to the origin, and a monotone 
decreasing function of t’ from the origin. When n is even, f(t’) is a monotone 
decreasing function of ¢’, infinite at the origin. The forms of the distributions 
in this case are similar to those arising in power transformations of normally 
distributed variates” when n > 1 and # < 4(n — 1) and also to the forms arising 
in Section Ila, Case I, when & < =. 

Even though we have a discontinuity at the origin, the total area under the 
curve is one, which is evident since we can integrate function (25) over the entire 
range of t’ when n is odd and positive. 

Casell. O<n<l. 

This case includes the distribution obtained by taking positive integral roots 
of a set of variates. As in the study of the normal distribution,” we limit our 
considerations to the principal real values of the functions. When n is odd, there 
is 2 minimum at ?¢’ == 0 and a maximum given by each of the two signs before 
the radical in (28). Hence in this ease, we have one minimum and two maxima. 

With the values for n and ag in (24), tma. = —-.164 and t,.. = —.500. The 
transformed distribution gives ta, = —.547 and two modes, the primary mode 
tno. = —.967, and the secondary mode t.... = .903. In contrast to the cor- 
responding transformation of normally distributed variates, the primary mode 
is less than the median. 


° The expression t; represents the lower bound of ¢ in distribution (24). 
10H. L. Rietz, Cf. loc. cit. p. 296. 
11 Cf. Rietz. loc. cit. p. 297. 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


TABLE II 


Comparison of the Type III Distribution and the Transformed Distribution when 
a3 = 6, n=83 





t" ft’) 


.000001 .000000 0 
.001347 . 625000 .000072 
.029467 . 000000 .002456 
.072787 .359375 .007922 
. 139285 .375000 .020635 
. 220462 .953125 .047032 
.301350 .000000 . 100450 
-405345 . 125000 .540460 
.414211 .015625 . 209125 
.406131 | .001000 .537700 
.401485 .000125 .531333 
. 398272 . 000008 893333 
.395962 ~ 
.393522 . 000008 935000 
.389628 .000125 - 950400 
. 382549 .001000 .751633 
.374795 .003375 .552519 
.397533 .015625 . 906843 
.307293 . 125000 .409724 
. 252971 .421875 . 149909 
. 200493 1.000000 .066831 
. 114233 3.375000 .016923 
.058376 8.000000 .004865 
.027285 15 .625000 .001455 
.011836 27 .000000 .000438 
.004820 42 .875000 .000131 
.001859 64 .000000 .000039 
. 000242 125 .000000 .000003 
.000027 216 .000000 











ouorrwwnhnd Ke 
Bl 28 F Meets 2 


et 





Case III. n < 0. 
Let n = —™m, where m is a positive integer. Then (25) becomes 


m m+1 





it) =” (+ ae ‘ 


'* 





A. C. OLSHEN 


TABLE III 
Comparison of the Type III Distribution and the Transformed Distribution When 
a3g= q. rn= 1/3 


| 
| 
| 





\ 

.00 ! | . 259921 

75 025272 .205071 .11010 
.50 | . 122626 144714 .48206 
25 | .251021 | 077217 .87385 
.00 | 360894 .000000 1.08268 
.90 | 393277. 965489 1.09998 
.75 | 427526 . 908560 1.05874 
.50 448084 .793701 .84683 
.27 | 433958 646330 54385 
.08 | .405678 430887 | . 22596 

| .390734 | | 

.08 | .374536 430887 . 20861 
.27 | .332927. 646330 | .41723 
.50 | . 280748 793701 .53058 
64 . 249865 .861774 55669 
.74 | .228711 | 904504 56134 
.90 . 196904 965489 .55064 
.00 .178470 .000000 53541 
.50 .104259 144714 .40985 
.00 .057252 .259921 .27265 
50 .029989 .357209 .16572 
.00 015133 442250 | .09443 
.50 .007410 518295 05125 
.00 .003539 587401 .02675 
.50 001655 650964 .01353 
.00 | .000761 .709976 .00668 
.50 | .000344 765174 .00322 


1 
1 
2 
2 
3 
3 
4 
4 
5 
5 


| Fr ete de he a te en 


Taking the derivative, 


m+2 1 —1l 1 
(30) ot = —fi(t’) {m= " (+8 +2 >) {im ~ tm 4 5 - ong t'™ — i, 


and in place of (28) we have 


(31) = =F > + = 4/ (y+ + 4m + -. 


2(m + 1) 


The transformed distribution has little statistical significance for odd values 
of m, since f(t’) is a disjoined distribution. There are no values for f(t’) in the 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 187 


m 

. — a3 : —2 . . iin ‘ ‘ 

interval, e ) <t' < 0, since <t< «. The transformed distribution is 
as 

thus composed of two sections, each with itsown mode. The section for negative 

values of t’, with range —« <t’) < (= 


=) , has a mode given by (31) with the 


negative sign before the radical. The section for positive values of t’, with range 
0 <t’ < o, has a mode given by (31) with the positive sign before the radical. 

When m is an even integer, if we assign to f(t’) the value 0 when ¢t’ = 0, f(t’) 
becomes a continuous unimodal distribution in the interval 0 < t’) < «, with 
the mode given by (31), with the positive sign before the radical. 


Ill. LoGarirumic TRANSFORMATIONS 


As indicated in the introduction, numerous studies have been made of the 
distributions obtained by replacing normally distributed variates by exponential 
functions of the variates. If a variate z, with range — » < x < ©, is dis- 
tributed normally with mean zero and unit variance, then by replacing x by 2’, 
where z’ = c + ce” the range of x’ becomes, c < 2’ < «©. Likewise if a variate x 
is distributed in accord with a Type III law, with range 0 < 2 < o, then in 
making the above transformation, the range of x’ becomes (c + 1) < 2’ < o., 
We shall now study the properties of the distribution of x’ obtained by the 
above transformation applied to distribution (1). Because of the similarity of 
the properties of the transformed frequency distributions, we shall take k = 1 
andc = 0. 


Letting x’ = e’, (1) is transformed into 
(32) fz’) = ye (log 2)” “2”, 1<2’ < o, 
Then, 


(33) WE) — f(a){a2" log 2’ (ve — 1) — (y + D) loge}. 


The derivative changes signs at 


yi—1 


(34) gv =e™, 


The arithmetic mean of the z’’s distributed in accord with (32) is given by 


ro 

- : Z—-1 

2 w | (log x’) a2’ dz’ 
1 


w [ ge tO) dy 
0 

yz 
= (—» :) if y>1. 


weet e se ek ee 





188 A. C. OLSHEN 


The integral is divergent when 0 < y < 1, hence for these cases we take 
0<k <y, then 


CO 
t—1 —-z(y—k 
wf 2 e zy dx 
6 


y \" 
- 7 (; - ‘) 


Likewise in order that the first s moments about the origin be finite when 
0<¥y<s, we must have 0 <k <i. 

The median of the distribution of x’’s corresponds to 
: 1 


— 


cy 


x = logz ‘ 


hence 


1 


z—— 


(37) Lng. =e °7, 


ee Sina 
1. The relative positions of the averages. We have e7*! < e “7 since 
yi — 1 
¥+1 
(38) tee. < Set. 


Also, 


<#-— J. Hence 
cy 


piel t..tud | pF & in em. 
ve log (=) sLl+etat |>s cy 


/. 2 
ee ( Y ) 
¥-1 
and hence 


(39) tna. < #. 
From (38) and (39), we have . 
(40) tno. < Loa. < #’. 


Therefore 


We shall now investigate the locations of the various averages as related to 
the upper and lower points of inflection whose abscissas will be denoted by I, 


and J; respectively. Taking the second derivative, 
2 ‘) 
= = f(x’) {x’ log x’}~ {(vy% — 1) (vz — 2) — (27 & + By — 2y — 3) log 2’ 


+ (y + Dy + 2)(log x’)’}. 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


The points of inflection are given by 


(41) zx’ au er) 


’ 


where 


si he (vé — 1I)Qy +3) tV (ve — I{ GE — 1) + 407 + DV 42)} 
; 2(y + 1)(v + 2) 
(a). To show 2mo. > I}. 

We have, 
"-!. (y% — 1)(2y + 3) — WE — 1)(p+ 1) 


y+1 2(y + 1)(7 + 2) P 








where 
pt+1l= V3 + ENE), since 2y + 4 > 2y +2 — p. 
Therefore 


yz—1 
evtl Po er tne) 


(b). To show timo. < Ii. 
We have, 


wi. (yF — 1)(2y +3) + WE —- DY + 1) 








y+1 2(y + 1)(7 + 2) , 


since 2y +4<2vy+4+>p. 
Therefore, 
yz—1 
eet co. 


From (a) and (b), we have 
(42)  <ein < kK. 


(c). To show tma. > I}. 
We have, 


Set. > Seo. ONO tne, > I; . 
Consequently 
tua. > Th. 


(d). To show the conditions under which z,ag. is less than or greater than J%, . 
Upon simplifying the inequality we find that e®‘”” will be greater or less 
1 


than e* °7 according as the expression 


(43) -2' +2{1(3+4)+0-ah+ t(2+8) - 1 (3+2)-2-3 
Cc 7 Cc 7 oy 7 Cc 





190 A. C. OLSHEN 


is positive or negative. But (43) will be negative for all values of < if its dis- 
criminant is negative or zero. Upon further examination it can be seen that the 
discriminant will be negative or zero according as 


(44) v-o(1-1)+1(64!)-7<0. 
c c Cc 


The quadratic equation in y, given by (44), factors into 


where 


1 (- 17 4 l 
= V/36(1 : 6+ : + 28 


‘ , , — 
Hence in order that rma. be greater than J, for all values of %, we must have 


Loa 2) +.4/o0(1 2) — “(0+ }) +00} 
1 — i ht =}. “Ke 
S93) | 36(1 ‘) 3 (6 + ‘) + 28 


When ¢ lies in the neighborhood of 3, y must be less than 5. Proceeding further, 
we can divide (43) by negative 2, reverse the inequalities, and factor the expres- 
sion into 


(2 i id 
See xr — 4 > 


1 =fi(o+4) +03) 


B= 4/a*+a{t(24! S 2 (3+2) 
RB < Al. 
Then 


(45) Sa. ? r 
and 
(46) Lend. a I’, 


(e). To show r <#. 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 191 


, = . a ’ , , , , - 
We have, I; < Z’, since %’ > ma. > Imo. ANd Imo. > Z,. Also, J, < z’ for 
— ° , ’ . i 
those values of y and Z for which J, < ama... It remains to be shown that Z’ is 
/ - rn } 
always greater than J,, for all values of y and & To show that 


peul 2) < (. u). 
y¥-1 


it will be sufficient to show that ¢,.(y, ) < , since 


; tnd Oe oi ae 
— (3) ao (1 + 2y + 3? * ) = 


The inequality is satisfied if 
(vé — 1) {7% — 1) +407 + Dy + 2)} < [2% + I) +2) — (ve — 1)(27+3)}?. 
This expression, however, reduces to the condition that we must have 

—2 — yi — 62 — GF — yt — 8F <0, 


which is always true. Hence 


and we have 


(47) 


2. Contact at the ends of the range. We shall now investigate whether the 
frequency function, f(z’), has high contact with the z’ axis. The function, 
f(z’), vanishes at both limits, and thus it will be sufficient to test the derivatives. 
The nth derivative can be expressed in the form of f(z’)(x’ log x’)~” multiplied 
by an nth degree polynomial in log x’. It can easily be shown that each deriva- 
tive will vanish at the upper limit, while the nth derivative will vanish at the 
lower limit provided yi > n. Therefore, f(x’) does not have high contact at the 
lower end of the range. 


3. Moments. The sth moment about the origin is given by 


2's = wf (log 2’)? 2" dr’ 
1 


oo 
ee e 
wf xy e a(y—s) dz 
0 


(48) (=) if y>s. 


Y-—8 
If y < s, then taking k such that y > sk > 0, we get in place of (48), 


vz 
4 hae DE 
(49) z's (. — ™ 





192 A. C. OLSHEN 


We easily obtain the recurring relationship 


(50) 


yz\8s 
- :) \ (log gy gat dx’ 
an 3) } ave” dx 
Y- 
yz 8 ; vz vz 
© G2) SGA CSN 
Y¥—S8 j=0 J 7-1 


If we do not take the value of k to be 1, then 


yz _s jvt nisl m 
ome (25) £00) (4) Ct)” 


oY y¥—s—jk 


IV. TRANSFORMATION INTO A NORMAL DISTRIBUTION 


We shall now consider a unimodal probability function y = f(x) with range, 
a < x < b, and shall seek to express x as such a function of ¢ as will transform 
t2 


y = f(x), into y = Ce *. For simplicity, we assume that y = f(z) has its 
t2 
modal value at x = 0, and thus each of the curves y = f(z) and y = Ce ? hasits 
one maximum value at the origin. 
In y = f(x) let log y = V, then equating densities,” we have, 


2 


V —logC +5 =0. 


av 


ee 


d’V 
ae +} 
d” V 
dt” ie 


12 Tf f(x) is a probability function or density of a distribution; then f(z) dz is, to within 
infinitesimals of higher order, the probability that a value taken at random will fall into 
the interval dz at z. 













thin 
into 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 193 





Under the assumption that z is a function which can be expanded in a Maclaurin 
series in powers of t, we shall use equations (53) to determine the values of 
d"x : , 
— in the series 


dt” _|rm0 
t 
(54) t= Ag+ Ait+ Art -:- 
Let v, represent | ,n=0,1,2---. Then = logC, and », = 0, since il 
Ax” |tm0 dx 


d ' , : ; 
and a vanish when xz = 0, that is when ¢ = 0, since Apo is taken to be zero. 


Taking the second derivative, 
ee 
dt? ~—- dx? \ dt dt? \dxJ , 


d’V 
=F | oa 













and 


Therefore we have 





(55) A, = (—m)™*, when » < 0. 










Also, 


d’V 
| = v3A; + 3v2.A,A2 = 0, 


and we have 


v3 
3; = 2- 
3u5 


(56) 


Similarly, 


5vr _ 34 Vo ( 


== 1203 


—w)', 


a — (4003 — 45v2v304 + Qv505) 
5 ae as aed 
(67) A, 45v8 , 








— {38503 — 630v203 04 + 2103 (8v3v5 + 57) — 24506} (—2)* 
ge 14407 ee fy 
: 2 


Though the procedure is straightforward, the work becomes somewhat 
involved in determing A, as n gets larger. For this reason, we proceed in the 





194 A. C. OLSHEN 


: 13 ° : 
following manner. By the use of Burmann’s~ theorem we can write (54) in 


the form 


‘. x df(x\\ ¢ d (z\\ ¢ 
- a tit, © (3) b 2! . a(i) bs 3! mee 


But 


= = — 7 (log C - V)* 
 * 


where V is a function of x. We have, 


log 0 = V| ‘ — at z= 0, 


z=0 dx 


and we may write 


av x dV x 
- ee + | at oe at 


dx? |,m0 2! 


“v3 (aox° + asz® + ax’ + - 


Un 


ms Fi (a2 + asx + agen? + a, where a, = —— 


n\ 


We can now write, 
(59) 


But 
n—1 n 
ge Tee (7) | = (n — 1)! multiplied by the coefficient of x” in (59). 
z==0 


sy (G-)(5-2)--(G-o) 


Hence, 


Ar! Aa! As! An! 


M1 e As hs 
ay a2 a2 de 


where the summation is over all values of \, such that 


An = (n— 1)! (2a) 
(60) 


n 


>> sr. =0, and p= 2 hs 
s=1 


s=1 


13 A. De Morgan, Differential and Integral Calculus, (1842) page 305. 









TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


This expression can be written in the form, 









n n—2 


=(n—1)!(2a2) ? 2 (-""5- -7s. ues 


61) paeairaiies 
n+2s D”"* a3" 









‘tp a *** 
where D is the derivative operator of Arbogast.” 
If we take expression (1) as our function of xz, we obtain, 
n— (n ar Ily 
v, = (—] : 
~ ON Gea 1 
which gives 
#—1 #~— 1) af 1 t° 4 t' 
_ Gy a ) + (y ad ti+——+ a 4 Q] 4 i. a ae 
62) Y Y 372! b6y(v¥—1)'3!  45y(v% — 1) 4! 
1 t° 


de vas 









* 36y(y2 — 1)! 5! 





n 


11% = 1)? 


2 oy" 


i (—1)'y'lz — @ — 1/y)F 
[=-(2-)F ” (2 (s + 2)(yz — 1) - 


This series is known to diverge for large values of ¢. However, the series is 















where A, = (n — multiplied by the coefficient of 





defined for those values of ¢ that correspond to z for the interval 0 <2 < 2( 2 — } 
7 
With the aid of (22) and Salvosa’s” tables we give in Table IV the percentage of 


the total population which is included in this interval. 


TABLE IV 


The percentage of the population, characterized by (1), which is included in the interval 
0< 2x < 2(# — 1/y), for different degrees of skewness 





as [aa fro] s | is || s | 
| | | - = — 

Percent of (79. 386 84. 880 89.781 93.908, 97.021 98.959 99, 805 99. 990 100 .000|100 .000|100 .000 

m2 | | | | 


5 | 4 3 2 | 1 
. 









Thus, in dealing with samples as large as 10,000, with moderate degrees of 
skewness, the probability of getting a value that falls beyond this interval 






1 Augustus De Morgan, On Arbogast’s Formulae of Expansion, Cambridge and Dublin 
Mathematical Journal, Vol. 1, (1848) pp. 238-255. 
18 Cf. Salvosa, loc. cit. page 2. 


196 A. C. OLSHEN 


becomes negligible. Hence it may be expected that with the use of a compara- 
tively few terms of series (62) we may transform the ordinates of a moderately 
skew Type III function to within close approximations of the ordinates of the 
normal function. 

Baker’ considered the transformations of a non-normal frequency distribution 
represented by f(t)dt, where the origin is taken at some central point and the 
scale is the standard deviation of the distribution. By equating probabilities 
he found a function ¢, such that by setting ¢ = ¢(u), he obtained 


flo(u)]. o’(u) du = e *™* du. 


{t seemed of interest to compare the results obtained by applying transformation 

(62), which is found by equating densities, to the illustration treated by 

Baker,” where the transformation giving equality of probabilities was used. 
The example treated was 


(63) fi) = .9929 (1 +h) . 


This is a Type III distribution of form (24), with aj = .2. From (22), 


yi = a« 100, and from (62) we obtain the series 
3 


(64) z= - (1 + .1005038u + .0033670u" + .0000282u* — .0000004u‘ + --.-). 


We shall utilize only the first four terms and rewrite (64) in the form 
yx = 99(1 + .1005038u +- .0033670u? + .0000282u*). 


However, from (23), 


which gives 


azt 


1 + “ = 100(1 + .12). 


Therefore, 


t = (yx — 100). 


16 G. A. Baker, Transformation of Non-normal Frequency Distributions into Normal 
Distributions, Annals of Mathematical Statistics, Vol. 5, (1934) pp. 113-123. 
17 Baker, loc. cit. page 117. 





TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


With the aid of Salvosa’s™ tables, we obtain the following results. 


TABLE V 
Comparison of the ordinates of the normal function, the function with skewness .2, 
and the skewed function transformed by 
t = 9.9 (—.0101010 +.1005038 u +.0033670 u? +-.0000282 u) 


| ‘ ‘ | Transformed 
Normal Curve ap inning ae skew curve 
‘ minus normal 











.053991 049243 .054226 .000235 
.078950 .076810 .079291 .000341 
.110921 . 112956 -111393 .000472 
. 149727 . 157043 . 150359 .000632 
. 194186 . 206951 . 195003 .000817 
. 241971 . 259120 . 242986 .001015 
. 289692 -308958 . 290905 .001213 
.333225 .391538 .334618 .001393 
. 368270 .382453 .369811 .001541 
.391043 .398583 .392678 .001635 
. 398942 .398610 .400615 .001673 
.391043 .383157 . 392682 .001639 
. 368270 .304545 .369811 .001541 
.333225 .316273 .334621 .001396 
. 289692 . 272360 . 290905 .001213 
. 241971 . 226714 . 242984 .001013 
. 194186 . 182641 . 194999 .000823 
. 149727 . 142563 . 150353 .000626 
. 110921 . 107939 111383 .000462 
.078950 .079354 .079277 .000327 
| .053991 .056702 .054214 .000223 


| 
bo 
° 








Wh AwWON RD 











2 
4 
6 
8 
0 
2 
4 
6 
8 
.0 


The ordinates of the transformed distribution are more symmetrical and 
approximate the ordinates of the normal curve more closely than the values 
obtained by Baker even though we have used only four terms in the transforming 
series. 

Returning to the general case, we may write 


b 30 12 
[ ydz = [ Ce? ot 


00 _ ’ - 
of (Ait dat Aad + a5 + ~~) dt 


2 


(65) 


18 Salvosa, loc. cit. pp. 64 et seq. 








198 A. C. OLSHEN 





provided the series converges for all values of ft. Under the assumption that the 
integrand satisfies conditions for the term by term integration of the series, 
we get 


? A; , 3A, Aoni1 
(66) b= [ude = 0Vie {As + Ae 4 Ay 8 4 Aer yb 





The area to the right of the modal ordinate is 


6 ‘oo ae Ast 
I yac = 0 [ 8 (A+ At +--)at 
Zmo. 0 


tof er(4 t+ Ae .. Ame yas 
0 - *3! (2n — 1)! 














| 
tl 


(67) = 


Nile 


A 
+ o(As [+ ++), 
Hence the area from the mode to the median is 


(68) o(4:+ 3+ ++). 


Let us consider distribution (1) again. The coefficients in series (62) are 
functions of the skewness, and become smaller with smaller degrees of skewness. 
Indications are that with moderate skewness, the series converges sufficiently 
to be used for certain formal purposes. If we assume this and proceed in a 
formal manner we obtain some interesting results that are consistent with 
approximations that have been obtained elsewhere. 

Thus, it is interesting to note that using the coefficients of series (62) in equa 
tion (66), we obtain 


T(ya) = V2r(7k — 1) (v2 — DTN 


1 1 
{1 + TaGe—1 + a88Ge—1 tT vf 


which is Stirling’s asymptotic form for I'(yZ).” 
From (68), the area from the mode to the median in the Type III distribution 
characterized by (1) is approximately 


2 1 
(69) o(2 ~ eet ++) 


where 






C _ y(ve - ee 
(yz) 





*9 EK. Czuber, Wahrscheinlichkeitsrechnung, Volume 1, (1908) pp. 23-24. 









TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 199 


Since (y# — 1) is large when the skewness is moderate, and since the terms of (69) 
are rapidly decreasing, the area from the mode to the median is approximately 
2C 
3y 
ordinate at the mode, hence the area from the mode to the median is approxi- 
mately equal to the ordinate at the mode multiplied by 2/3 of the distance 
between the mode and mean. Therefore with moderate skewness the median is 
approximately 2/3 of the distance between the mode and mean, which conforms 
to the approximate result first obtained by Karl Pearson” for the Type III 
distribution. We may, for all cases resulting in (68), take Ae as being approxi- 
mately equal to the distance from the mode to the median. This becomes 
somewhat more apparent by finding the arithmetic mean of distribution y. 
Thus, 


equal to But : is the distance from the mode to the mean and C is the 
7 


of et (ar+ Att Ae + a,f 
: o+ ib + op + say 


co] 


2 


t t” 
(4, + Ast + Asx + Asay t+ -: ) at 








— As , 345, __ 
C Vie (Ai +e+3 +: .-) 


3A,A2 , AvAs AoAs , 5A, Ag , 5Ac Az 
Auda + (24542 4 Ast) 4 3( Sets + te et 


Az 3A; 
At s+ at 








3Ae 15A, 
Remembering that Ao is the abscissa of the mode, it becomes apparent that the 
mean is, in general, approximately equal to the mode plus 3/2 of the distance 
from the mode to the median. 
Though series (62) is known not to converge for large values of t, it is interest- 


ing to note that if we use distribution (1) for y, we have from (70) 


, 1 3/2 1 
71 eae = I i cipeeraa ia iaiaiia A ice 
™ ” (= 5) +5(s) 3y(ve—1)3!t 
the first two terms of which give @, which is 4; , and hence if (71) were an exact 
formula, the sum of the terms beyond the second would be zero. 


For example (63), it can be seen from the following that Az furnishes a close ap- 
proximation to the distance from the mode tothemedian. Here, ¢ = .1 (y% — 100); 


putting 7 = ¢ — : we have tm. = -l(yv%@ — 101) = —.1. Putting 
7 


20 Karl Pearson, loc. cit. 





A. C. OLSHEN 


t= (2 — ‘) + z. where Az = =. we have as our approximation to the 
v 3y 3y 


median 


. 301 
ave _- ) 


— .03333. 


Interpolating in the Salvosa tables, we find for a3 = .2, tma. = —.03331 approxi- 
mately. Hence it is seen that the interpolated values checks very closely with 
that obtained by using the Ag criterion. 

We shall now consider briefly the transforming series, when for y, we take 
distribution (4). Then, corresponding to (54), we obtain the series 


,_ Gé—n)" , nlye—n)"*, , n@n— Ive —n)*"* C 

(72) t= y +> y" —t+ —— 21 
4 n(6n* — 6n + 1)(yz — n)™3 e 
67” 3! 


‘ n(45n® — 90n? + 45n — 4)(y% — n)” #4 
45y” 4! 











cans 


When n = 1, (72) reduces to the series given by (62). Suppose we are primarily 
interested in the cases for which 0 < n <1. For these cases the coefficients of 
(72) decrease more rapidly than do those of series (62). Under the same 
assumptions as to convergence which were made in working with the latter 
series we have, from (68) and (72), the area from the mode to the median given 
approximately by 


i 





(73) —_ — 1)(yz - n)”" 


n(45n® — 90n? + 45n — 4)(y% — nn)" 4 \ 
37" : 1357" ee 
When 0 < n < 1 we always have y% > n; then Az > Oif nm > 1/3, and Az < 0 
ifn < 1/3. Therefore, if Az is taken to be approximately equal to the distance 
from the mode to the median, we have tmo. > Xma. if n < 1/3, and Lino. < Tm. 
if n > 1/3, since Az is positive or negative according as n is greater or less than 
1/3. Combining these results with (70), we have % > ana. if timo. < ma. , and 
# < 2g. if Loo. > Ina. , Which are the relations given in Section Ila, for case II. 





A TEST OF THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN 
MEANS OF SAMPLES FROM TWO NORMAL POPULATIONS 
WITHOUT ASSUMING EQUAL VARIANCES’ 


By Daisy M. STARKEY 


1. History of the problem. If the only available evidence about two normally 
distributed populations is contained in two samples, one from each, it has 
hitherto been the custom to the test the hypothesis that the means are equal by 

= = 


assuming that the quantity Viet R572 9 distributed in Student’s distribution, 


y _ z\2 
with N + N’ — 2 degrees of freedom, where s° = (x — 4) 


N(N — 1) 
WN oy , the other notation being that used by R. A. Fisher.” The 


and k = 


hypothesis underlying this test, however, is that the variances are equal. Al- 
though in many cases this may seem a reasonable assumption to adopt concur- 
rently with that of equal means, it is undoubtedly not a necessary one, and it is, 
therefore, desirable that the test should be adapted to meet this difficulty. 

The first advance on the problem was made by W. V. Behrens’ who suggested 
that the distribution of the difference of the means could be expressed in terms 
of the observations in the samples from the two populations, the argument 
being entirely independent of the variances. R. A. Fisher* obtained substan- 
tially the same result, but expressed the argument in terms of fiducial probability. 
M. S. Bartlett? was of the opinion that Behrens’ reasoning was incorrect, as he 
obtained some results which were apparently inconsistent with those tabulated 
in Behrens’ paper, but R. A. Fisher’ showed that Bartlett’s argument 
was open to criticism. In the latter work, he actually obtained distributions 
for the case of two samples of two observations, and in the following we shall 
indicate some extensions of this more detailed work of Fisher, firstly, to the case 


1 Presented at the joint meeting of the American Mathematical Society and the Insti- 
tute of Mathematical Statistics, Indianapolis, December 30, 1937. 

Research done under a grant-in-aid from the Carnegie Corporation of New York City. 

2? Statistical Methods for Research Workers, 1925-1936. 


3 “Ein Beitrag zur Fehlerberechnung bei wenige Beobachtungen,’’ Landw. Jb. 68, 807-37 
(1929). 


4“The Fiducial Argument in Statistical Inference,’ Annals of Eugenics, 6 (1935) 
pp. 391-8. 


5 “The Information Available in Small Samples,’’ Proc. Camb. Phil. Soc. 32, pp. 560-6 
(1936). 


6 “On a Point Raised by M.S. Bartlett on Fiducial Probability,’’ Annals of Eugenics 7 
Part IV, 370-5 (1937). 


201 








202 DAISY M. STARKEY 


of other small samples of even numbers of observations, and, secondly, to samples 
of very large numbers. 





















2. The case of small samples. We recapitulate, briefly, the preliminary 
argument of R. A. Fisher’, in which he denotes the unknown population means 


, t= 
by wand yp’. Since t= 
y s 





= t, where ¢ is distributed in Student’s distribution, 


we may write » = Z — st, and obtain the fiducial distribution of the population 
parameter yu in the form G,(u) du, where 


<2 ) 
G(x) du = n’ i — > n+1? 
r(%) / an fy 4 ey /- 2 
4 \ : 


and a similar result for the fiducial distribution of u’. The simultaneous fiducial 
distribution of » and yw’ is thus G,(u) Ge(u")dudu’ from which the fiducial distri- 
bution of » — uw’ may be found. We may note that the characteristic function 
of — (u — w’) + (@ — &’) is M(x), where 


M(x) di I eitl-—n')4 (2—2')] Gi(u)Geo(u’) du dy’ 


‘ -{ I ef(te—t"s") F(t) H(t") dt dt’, 


(=) 
H(t) = - 2 dt 


where 


n+1? 


r(%) a/xn (1 + 0/n) 2 


with a similar expression for H2(t). Thus from the fiducial point of view, the 
problem is essentially that of formally determining the distribution of the 
variate ts — t's’, or at + bt’, where a = s, b = — s’ are regarded as constants, t 
and ¢t’ being distributed in “‘Student’s’” distribution. The hypothesis p = p’ 
may then be examined by testing whether # -- 2’ is a significantly large value of 
this variate. We shall approach this distribution problem through the use of 
characteristic functions. ‘ 

By definition, the characteristic function of “Student’s” distribution 
represented by the integral 


(1) 1 r[3(n *+ 1)] 2 - pitt 
Vrn r'(3n) atl 


(+4) 
n 


and may be evaluated by three methods which will be briefly considered. 


i 


S 



















es 





TEST OF DIFFERENCE BETWEEN MEANS 


First, by integrating the function 
1 Th +0) et 
/ mn I'(4n) 2 = 
0+ 
n 





around a standard semicircular contour in the upper half of the z-plane, the 
value of the characteristic function is at once proved to be 277 times the sum 
of the residues of the integrand within the contour when the radius of the semi- 
circle becomes infinite. Within the contour there is one pole only at z = i+/n. 
The residue at this pole is the coefficient of 1/h in the expansion of 


1 r[3(n + 1)] ee 


Van TQ) eH 
Uah™* va)! 


in ascending powers of h, which may readily be evaluated when 7 is odd. 
Second, by using the result that 


1 = eizt aul 
: dt = « 
es 4 





from which we deduce that 


: L a dt = LL g-valel | 
wT Jwat+t V/a 


Differentiating this result (n — 1) times with respect to a, again considering odd 
values of n, we have that 


n—1 








n+ n—1 


" (a+) 2 da? 


Tv 





n—1 ' n— 
i(n—l re. Ut" eg” 1 | 
SE fr ft el, 
a 









By forming the first order differential equation in y = a e Va'*! and differentiat- 
a 


ing it 3(n — 3) times using Leibnitz’ theorem, we may obtain a linear relation 
between the derivatives of order 3(n — 1) and lower; similarly, by differentiating 
3(n — 5) times, we may obtain a linear relation between the derivatives of all 
orders up to and including 3(m — 3), and continuing in this way, we obtain a set 
of 3(n — 1) linear equations in the 3(n — 1) unknown derivatives. These 
equations may be solved for the derivative of order 3(n — 1) by the determinant 
rule. The denominator determinant is independent of x, and the numerator is 
e '*!Va multiplied by a polynomial of degree }(n — 1) in x. Using this fact, we 
may specify undetermined values for the coefficients in this polynomial, and 
obtain relations between these values for two consecutive values of n by differ- 


204 DAISY M. STARKEY 


entiating once. The recurrence relations thus obtained may be used to verify 
by mathematical induction the following relation, after substituting a = +/n, 
1 Thm+1))) e* 


== : ainsi i 
Van  T(3n) = n+l 


2\ 2 
(2) ( * “) 


onlelva fit izivn + (elvan (n — 3) , (le |-v/m)" (n — 5) | 


2! (n— 2) 3! 
the coefficient of (|x| +/u)” being 
1 (n - — 4k + 1)(n — 4k + 3)(n — 4k + 5). -(n— 2k _ 1) 


(2k)! | ~ (n — 2)(n — 4)(n — 6) --- (rn — Fh) 
and the coefficient of (| x | ~/n)”*" being 


1 a= Se -(n — 2k — 3) 
(k+1)! £(n—2)(n—4)---(n—2k) | 


This is, therefore, the value of the characteristic function, and is the same in 
form as the result which may be obtained by the first method. There are 
evidently a finite number of terms, the degree of the polynomial being 3(n — 1). 

Third, the characteristic function may be shown to satisfy the second order 
differential equation. 





2, 
gat 
dx? 


dy are 
(n D = nry = 0. 


By change of variables y = e *Y"v (we assume that x is positive, as it may be 
replaced by its absolute value in the integral) and u = xz Wn, we obtain 


dv dv ( 


u-— — —(n—1+2u)+(n—-1)v=0 
du du 
Using the Frobenius method of solution in series, we obtain as one solution when 
n is odd the expression 
2 (n — 3) > (n — 5) 


=1+u+; = +: 


2! (n — 2) 31 (n — 2) i 


and the corresponding value of y has already been proved to be the value of the 
characteristic function. It is probable that the corresponding solution of the 
differential equation would also be the value of the characteristic function when 
niseven. In this case, however, the indicial equation has roots differing by an 
integer, and the solution of the differential equation is much more complicated 
in form. Nevertheless, it seems possible to find a series expansion for the 
characteristic function of “Student’s” distribution in this way whatever be the 
value of n. 

The characteristic:-functions of the distributions of at and bt’ may now be 





TEST OF DIFFERENCE BETWEEN MEANS 205 


readily obtained by replacing x by az and bz in the above expression. Multi- 
plying the characteristic functions of these two independent distributions, we 
obtain the characteristic function of the distribution of at + bt’, which is of the 
form 


M(z) = e tz alvntlbl yn) (1 + |x | (| a| Jn 4 | b | Jn’) Bites q 
(n + n’ — 2) 


the term in brackets being « polynomial of degree ~ jameeae We may now 


use the result that the distribution is given by the integral 
1 eo 


Qa Jc 


e'“* M(x) dz, 


and so obtain the distribution of u = at + bt’. 

A distribution so obtained would involve four constants, a, b, n and n’, anda 
derived probability table would thus be very complicated. It may, however, 
be simplified firstly by considering the case of equal sample numbers, and, 
secondly, making the transformation 


at + bt’ 
(3) ya Oe t™? 
|a|+ || 
whence the resulting distribution involves only two constants, n, and the 
ratio a/b. In this case the form of the characteristic function is 


2 = ' 
(zivar SFM gaat lel 
2! ({a| + |b)? 


In determining the form of the distribution, we shall encounter integrals of the 
form 


(4) &¥") 14/2] Yn+ 





2 
1 _— f — st 
(n)*? e telva-te |e |” de. 


oo 


This can be reduced to 


eo eo 
ni? | e 7¥ntP ay? da + 7}? | e *Vattsre? dz, 
0 0 


and, integrating by parts, or using the Gamma Function integral, we obtain 
as the value of this integral 


1 1 
RPO ce fp _____ 
_— | + iv)? © (/n - gical. 
Writing v = +~/n tan @, this reduces to 


! 
Vi 2 cos (p + 1) 6 cos”**@, 


n 





206 DAISY M. STARKEY 


The distribution is thus seen to be:— 
(5) I [po + pi cos 20 + pe cos @ cor 30 + --- + Pas cos” 6 cos né] dé, 
T 


where 


(a? + b’) = 2 4 Sie) 
n—2 eos 
(Ja| + ||)? 


It is obvious that the values of the coefficients p may all be expressed in terms 


. lal 1 
of the ratio | ; '. Denoting this ratio by r, 





Pp=1m=1,p2= 


+ 1 —— 


3 
ee 
= 


ery” 


and thus we could construct a table for the probability integral involving n, r and 
v only. 


The process of evaluating the probability integral may be simplified by con- 
sidering the term already evaluated, 


ni” f a x |? dz. 


Integrating this expression under the integral sign with respect to v, between 
the limits v and ©, the contribution to the probability integral from this term is 
seen to be 


e ni? 


3 
ome ae =| 
—- e tla “er a, 
a —o 


which on the introduction of the same transformation as before, gives the value 
— 2 (p — 1)! cos? @ sin pé. 


Thus, from (5), the total probability that @ should lie between = and a given 


value, 6, is 


(6) a — 6 —cosdsing — cos’ sin 29 — -.. — —2"=. cos" sin(n — vo], 


2 2 n—1 


Tv 


Tv 
where — ~ S @ 
2 
The following summarises the results for small values of n. 


at + bt’ 


tan @ = lal +] ‘ 





TEST OF DIFFERENCE BETWEEN MEANS 207 


The results reduce to those already given by Fisher. The distribution is then 


simply <. or Student’s distribution, and is independent of a and b, and the 
probability integral is ; ~e ‘ 
Tv 


/ 
n = 3. tan 6 = * ign ie 


V3 (la} + ||) 


The distribution function is 


© el + cos 20 + cos 6 cos 20| , 


or 
(1 + r)? 
and the probability integral 
| 5 — 0 — cososino — cos’ @ sin 26 
72 ' a ~ rye ” 


at + bt’ 
-_ be ~—.2-- wee. 
' ° an’ = Vallal +16) 


The distribution function is 


dé 2(r? + 1 + 3r) 
el + cos 26 + 5 a rr 


oo oy cos’ 6 cos 40 + —— 31 a ry cos’ @ cos oa, 
and the probability integral 


l|lz i (7? + 1+ 3r) ae 
i — nae entiation ; 
| cos @ sin 6 3 (l rye cos @ sin 20 


2r es 2r° ee | 
+ 31 + cos @ sin 36 31 +r cos 6 sin 46 |. 


3. Samples of large numbers. The foregoing method is not suitable when 
nand n’ are large. In this case we use the asymptotic expansion of “‘Student’s” 
distribution which has been worked out by R. A. Fisher’ and is of the form, 


_ 1 TR@+D) dt 
Ke) dt = Ta) ai 


(7) ( + ‘) 





~ ani Be Pe... +o 4 ), 
V 25 n n* n 


7 “The Expansion of ‘Student’s’ Distribution in Powers of n“,’’ Metron, Vol. 5, no. 3 
(1925), pp. 22-25. 















208 DAISY M. STARKEY 


where P, is a polynomial of degree 4k in ¢t, such that 


§— 2° -1 3¢° — 282° + 30¢ + 127° + 3 
P, = 7 “ae? , P, — ——— 9 ———— sy ’ ete. 
The development of an asymptotic expansion for the distribution of at + bt’ 
is obtained by combining the asymptotic expansions of tand?’. The theoretical] 
justification of the process used makes use of the following lemma: 


If R).(t) ts the remainder after the first (k + 1) terms in the asymptotic expansion 
of “ Student’s’”’ distribution in descending powers of n, then lim n; [ | R(t) | dt = 0. 
no 0 


In the proof, the symbol “lim” will be used to denote the limit as n tends to 
infinity of the quantity in question. Let S;,(¢) represent the sum of the first 
(k + 1) terms of the above expansion. It may readily be shown that if 
0<6 < }, 


lim n‘ [. f(t) dt = 0, 


and hence that 
lim n‘ [. | R,.(t) | dt =0. 


1 1 


Using an expansion for the logarithm of — ~4, and the asymptotic 


a/ rn p\ 2 
<4 
n 
expansion for the logarithm of the Gamma function, the following asymptotic 
expansion may be obtained, log f(t) = — } log 2m — 3¢° + w, where 


wee oer - 1) + J (of + 3¢) + ve 
4n 12n? 


(8) 


Gop+. 
+ + ee ~ = 





R, + Ry, 


Gep+2 being a polynomial of degree 2p+2, and 


pets af??*4 
|Rp| < Dnt ~ D+ Dar where 0 <a <1, 
gee 
'Tp41| = (p + 1)nPH 
saa 7A B2”*°A4 
iniigeniaiatiaia = - a ] 
Fel < GE DD +2 — DF I)p+ Ann’ Where <B <1, 


A being a constant independent of n. 



























TEST OF DIFFERENCE BETWEEN MEANS 








Thus, using Taylor’s expansion, we obtain 


1 he w w w* we Ow 
a" (itwt Re St. $B + he , 


where 0 < @ <1. 


i) = 












Evidently R,(t) is of the form 


1 ef? Qk+1 Gk+2 Qk (p+2) wr" ef 
(9) Fa[¢ (ae + nkt2 oe nk (p+?) + (k - + yr , 
the quantities g being polynomials in ¢. 
Using the moments of the normal distribution, it may readily be shown that 




























i 8 
f . « [* La? ( Get q@k+2 Qk (p+ 2) a 
lim n , Je € (4 + nef +--+ + ay dt = 0. 
In the range of integration, when n is sufficiently large, it is evident that 
ee eT 
n n- nP 
+ oe +|Tya|= OM") if0<5<} 
wt! Ow —}e2 | Kn’ a 

C Voz(k + 1)! eve” | ate ni~@° where K and K’ are constants. 

and hence 
C 6K’ 


lim nt [ | Ri (t) | dt < lim — er =-0, ifd< 
0 


ni- —55—4K5 


<6 


We can also deduce the following results: 
1. Since the value of the integrand is unaltered if ¢ is replaced by —t, we have 
at once 


lim n* . | R(t) |dt = 0. 
2. Using both of these results it follows that 
lim n‘ [ | Ry(t) |dt = 0. 
Hence ~ 
3. lim n‘ [ | R,.(t) | dt = 0, 
where ¢ and ¢’ have any real values, and thus it is legitimate to integrate the 


asymptotic expansion of f(#) term by term with respect to ¢ between any given 
limits. 


210 DAISY M. STARKEY 


4. If $(¢) is a function independent of n which is bounded for all values of t, 
the asymptotic expansion of f(¢)¢(¢) in terms of n may be integrated term by 
term with respect to ¢. In particular, if ¢(f) = e'”, an asymptotic expansion 
for the characteristic function of “Student’s’’ distribution may be obtained. 

We may now consider the form of the distribution of at + bt’, and in order 
to simplify the argument, the following reasoning applies to the case in which 
the sample numbers are equal, although a similar theory may be developed for 
sample numbers which are of equal orders of magnitude. We may write 


f) = S(t) + RO, ; 
f(t’) = Sv’) + RC), 
u = at + dt’, 


and hence t’ = ~ + a The joint distribution of u and ¢ is therefore 


u—at ,~ fu-—at u— at 
| SiS, ( b ) + R,,(t)S; ( F ) + S,(t)Ri ( b ) 


u — at\ | dtdu 
oe R.(OR: ( b )| _—* 


The distribution of wu is obtained by integrating this expression with respect tot 
over all the possible values of ¢ between —x» and +o. The product 
S(t) Si (* ; *') gives the first k + 1 terms in the asymptotic expansion which 


u— at 


is the product of the asymptotic expansions of f(t) and f ( ), and a re- 


mainder ¢(t), where 


(10) 6) =o CS) (2 +7 +.. 


neti | yk+ 


¥1 , V2, -+- ¥; being polynomials in ¢. Let 


Ri(t) = o(t) + R,(t)S, (: i *) + R, (“ i =) S.(t) + ROR: (* . =“). 


b b 


Using the expressions for the moments of the normal distribution, it may be 
shown that i | o(t) | dt = o( 2). Let the upper bound of the bounded 


function S,(¢) for all values of n and t be B. Then 


3 } a | 0 
| |S (* ; “) R,(t)|\dt < Bf | R(t) | dt 
ge | — 


= o(n“*). 





TEST OF DIFFERENCE BETWEEN MEANS 


S.()Ri (5") \dt = o(n*). 
| ROR: ¢ . * ‘dt = o(n“). 


lim n* | R(t) | dt = 0 


Similarly 


and hence the distribution of u may be obtained by integrating the asymptotic 
expansion which is the product of the asymptotic expansions of f(t) and f (“ * ) 


term by term. 
In practice, it is convenient to find the distribution of 


(11) on BF. 
Vae+b 
bt — at’ ‘ ; 
—— and, using the above result, it follows that the 


Va + b 


distribution of w is given by 


We substitute y = 


« 4 2 
few ast LL ee - oS 


in| (@ +8) e+e 


(bw — ay)’ _ , (bw — ay)’ _ | 
+ ere ? (a + B) — "so" 
which is equal to 


dw oohe? t 4 1 jeer b’) os 12w* a’ b* + 3(a" + b') 


\/ 2 4ni ~—— (a + BY? 
—4-2u'| + vo}, 


It may be noticed that this distribution may be expressed in terms of the 
ratio a/b only. The probability integral may readily be obtained. There is 
no theoretical difficulty involved in obtaining any desired number of terms of 
this expansion, but they rapidly become too complicated to handle with any 
ease. Moreover, it is difficult to find a limit of the error committed in using 
any given number of terms of the series for the probability integral as an 
approximation to the value of this integral, as the somewhat complicated 
method of obtaining the series masks the form of the remainder. While it is 
undoubtedly true that when n is very large the distribution approaches nor- 
mality, and for a somewhat lower range of values of n the first two terms of 





(12) 


a, . 





212 DAISY M. STARKEY 


the expansion should be taken, etc., it is difficult to forecast the number of 
terms which should be retained for any given value of n. In fact the same 
difficulty seems to exist when using the original asymptotic expansion of 
“Student’s” distribution for the calculation of probabilities. For instance, the 
coefficients of the powers of ¢ which occur in the sixth term of the asymptotic 
expansion of the probability integral are larger than those occurring in the 
fifth term, and, in consequence, in spite of the greater power of n in the denomi- 
nator, for certain values of n these may contribute more to the probability 
integral than the previous term. We are unable to say anything about the 
aggregate of succeeding terms in general, and, therefore, it does not seem 
legitimate to drop all the terms following a term which yields a contribution 
beyond the limit of accuracy desired. This difficulty is even more apparent 
in the case in which the coefficients of the various powers of ¢ occurring in the 
terms beyond the first involve also the ratio a/b, and it is probable that the 
different values of this ratio which are possible would lead to different numbers 
of terms being taken for the same value of n in order to gain the same degree 
of precision in the probability integral. 


4. The distributions of the test quantities which correspond to (3) and (11) 
for equal means, when the ratio of the variances is a known quantity. When 
the ratio ¢ of the variances is given, the foregoing arguments, which are inde- 
pendent of the parameters specifying the distribution, may no longer be applied, 
for this would be information not supplied by the sample. In this case, the 
distributions of the test quantities which have been used take forms which 
depend only on the ratio of the variances, and are independent of the sample 
estimates of the variances. 

The quantity (3), used in §2, when n was a small odd number, was —— = 9, 


6 € + s’ 
where s° = Ni = NOT and n = N — 1. On the assumption 
of equal population means, the distribution of this quantity takes the form 

2r(n + 3) dv V8 (/ gz + 1)" "(V6 —- 2)"" 
(13) ——_ .... &) rr: 
r(3) Vn (1 +¢)"" ~ ve G +i + “) 


Thus in the case n = 1, we obtain 


dz. 


_ dv | Ve 4 a | 
rit+v?L@+1+¢)! @’+¢+1))’ 
which is the result given by R. A. Fisher.” The integral may be evaluated in 
terms of elementary functions for small odd integral values of n. 


In §3, (11), the distribution of the statistic w = “— *_ was considered 


V8 + 8” 





TEST OF DIFFERENCE BETWEEN MEANS 


when N was large. The exact distribution may be proved to be 


e, 3 


at, (n+ 2)n"1 + 9)" _d ( 1a __n(1 - ¢*) ) 
(14) 6° * le +n to) rin) NT ™ ep +a +6) 


where F is the hypergeometric function. If ¢ = 1, we have the limiting case 
in which the argument of the hypergeometric function is zero, and obtain 
“Student’s’”’ distribution, which is to be expected in view of the evidence stated 
in §1, the numbers in the samples being equal. 


CoLuMBIA UNIVERSITY. 





SOME EFFICIENT MEASURES OF RELATIVE DISPERSION’ 
By Nivan Norris 


For some time it has been known that the coefficient of variation (in the sense 
of the ratio of the standard deviation to the arithmetic mean) is not an efficient 
statistic for distributions departing materially from normality.” At various 
times there have been proposed certain supplementary estimates of relative 
variation, such as those involving ratios between sums and differences of upper 
and lower quartiles, and ratios of mean deviations to medians or to arithmetic 
means. Some of these have appeared in certain textbooks.* But there appears 
to have been no attempt to found their use on considerations of minimum 
sampling variance. 

The point of departure of this paper is that of using the Method of Maximum 
Likelihood to derive two efficient measures of relative dispersion, together with 
expressions for their standard errors. These optimum estimates of true or 
parametric variation are the ratio of the arithmetic mean to the geometric 
mean (the arithmetic-geometric ratio) for Pearson Type III distributions, 
and the ratio of the geometric mean to the harmonic mean (the geometric- 
harmonic ratio) for Pearson Type V distributions. The usefulness of these 
measures is suggested by the generalized-mean-value-function approach to the 
analysis of averages, especially the theorem of inequalities among averages.’ 


1 Presented before a joint meeting of the American Statistical Association and the 
Institute of Mathemati¢al Statistics at Chicago, Illinois on December 28, 1936. 

2 The term “‘efficient statistic’ is used here in the sense of R. A. Fisher, that is, of a 
parameter-estimate which tends towards normality of distribution with the least possible 
standard deviation. Fora discussion of the inefficiency of certain commonly used statistics 
as applied to distributions departing from normality, see R. A. Fisher, ‘‘On the Mathemati- 
cal Foundations of Theoretical Statistics,’’ Philosophical Transactions of the Royal Society 
of London, Series A, Vol. 222, 1922, pp. 332-336. 

3 See, for example, William Vernon Lovitt and Henry F. Holtzclaw, Statistics (Prentice- 
Hall, Inc., New York, 1929), p. 134; Herbert Arkin and Raymond R. Colton, Statistical 
Methods (Barnes and Noble, Inc. New York, 1935), revised ed., p. 41; and Herbert Soren- 
son, Statistics for Students in Psychology and Education (McGraw-Hill Book Company, 
Inc., New York, 1936), pp. 153 f. 

4 Nilan Norris, ‘‘Inequalities among Averages,’ Annals of Mathematical Statistics, 
Vol. VI, No. 1, March, 1935, pp. 27-29; and ‘‘Convexity Properties of Generalized Mean 
Value Functions,’’ Annals of Mathematical Statistics, Vol. VIII, No. 2, June, 1937, pp. 118- 
120. Professor John B. Canning appears to have been the first to point out the possibility 
of making use of certain ratio measures of relative variation. See ‘‘The Income Concept 
and Certain of Its Applications,’’ Papers and Proceedings of the Eleventh Annual Conference 
of the Pacific Coast Economic Association (Edwards Brothers, Ann Arbor, 1933), p. 64. 


214 





of a 
sible 
stics 
1ati- 
ciety 


tice- 
tical 
ren- 
any, 


tics, 
fean 
118- 
ility 
cept 
"ence 


EFFICIENT MEASURES OF RELATIVE DISPERSION 215 


This theorem states that if t; < t, then $(t:) < (te), where the unit weight or 
simple sample type of generalized mean value function is defined as 
1 
t t t 
z= = --- +2,\' 
(1) o(t) = (tat +*) . 
n 

The zx; are restricted to positive real numbers not all equal, but ¢ may take any 
real value. A necessary and sufficient condition that ¢(— ©) = ¢(t) = ¢() 
is the excluded trivial case that 21 = 42 = --- = 2%. When the 2; are not all 


equal, the ratios between various pairs of averages as generated by Ba yield 
1 
ratio measures of relative dispersion, the usefulness of which depends, in part, 


on their efficiency as estimates of population-characterizing constants (param- 
o(1) _ A 


eters). The arithmetic-geometric ratio may be written (0) > G and the 
. ' : = ¢(0)  G : 

geometric-harmonic ratio may be written aa" In certain cases it 
may be of convenience to reverse the order of each of the ratios. The standard 
errors for the two forms which each of the ratios may assume are presented 
below. 

The demonstration that these ratio measures of relative dispersion are 100% 
efficient statistics for their appropriate distributions, and the derivation of use- 
ful expressions for their respective standard errors both may be accomplished by 


the ordinary method of differentiating the logarithm of the likelihood. 


Let digamma of x = Fp(x) = 5 log z!, 


2 

and trigamma of xz = F,7(zr) = = log x! 
For Pearson Type III distributions, the frequency with which the variate x 
falls into the range dz is given by 


woe 
(2) df = 1 (2) e 2 al 
p!\a a 


The parameter a measures the absolute dispersion of the distribution, and the 
parameter p determines the general shape of the frequency curve. The relative 
variation may be regarded as a population parameter, @, defined as the ratio 
of the population arithmetic mean to the population geometric mean. Let the 
logarithm of the likelihood for this distribution be represented by L, we have 


(3) L = —nlog p! — n(p + 1) loga + = log 2; — “Za, 


where the summation is taken over the n individuals of the sample. It follows 
that 


(4) 


9 


« 


] 
eee (p+1)+-—22;; and - 
a a- 0 


aL L n 25 
_ > 2) a 


oa 


<a OOE 








216 NILAN NORRIS 


When L is maximized with respect to a by equating to zero the first derivative 
of L with respect to a, we find 


(5) 





D2; on 
a a(p + 1). 





It also follows that 


© —nFp(p) — nlog a + & log z;; 
op 
(6) 2 2 2 
ee ws —nFr(p); and Se ig OO me alt 
ap? wee 8 dadp apda a 


When L is maximized with respect to p by equating to zero the first derivative 
of L with respect to p, we find 















1 
(7) (IIx;)*" = ae”? 


The optimum estimate p of p is therefore found from (5) and (7) to be given by 
the equation 


_— 1 
(8) (p+ 1c"? = 7 (z;)* == 


But (p + 1l)e ””™ is the parameter 6. Hence we find the optimum estimate of 
6 to be =: which can be expressed in terms of the generalized mean value func- 
tion as oO. Therefore, for distributions well graduated by a Type III curve 


the optimum estimate of 6, the ratio of the arithmetic mean to the geometric 
— A 
mean, is given by % 

If only p= is being estimated, (a given) the variance, or square of the standard 


deviation of p is obtained from ot and is V(p) = To a first approxi- 


nF r “ 


. A 
mation, the variance of G? the estimate of 0, is found from the usual relation 
Tr 


between the variance of a function and the variance of the argument, namely 


(9) Vis@)] = | 2! V(2). 

Since 

(10) fw =0| 1 - Feo) | 
dp p+ —? 

therefore 


nF rz (p) ; 


(11) v(5) = [Po ee a 


I 














EFFICIENT MEASURES OF RELATIVE DISPERSION 217 


or the standard error of 4 is the square root of the last expression, if only p is 


being estimated. If it is more convenient to do so, one may reverse the terms 
in the ratio to obtain 


(12) v( 4) = a ble all 


f nF tp) : 


and extract the square root of the last expression to obtain the standard error 
of : 
4° 
If a and p are being estimated simultaneously, there exists the matrix of 
negative mean values 


2 | |! 
-2 (4) -2(24 Ly I=@+1) = | 

\! 0a, oadp { | a° a 1] 
(13) 1 | = = | | 


] aL a’L\ || n » ny | 
| H ) #(=) | ee ie 


; ; A 
from which the variance of Gq can be computed. In fact we have 


is p+ 
- "= i@ + DFr@) — 1 ~ “Fw - 





and consequently 


(15) v(4) a [Fe —_* pti] : 7 


G Tn 


The standard error of f is equal to the square root of the expression in (15), 


if both a and p are being estimated. If the terms in the ratio are reversed, 
one obtains 


Fz (p) on 
(16) v() =¢ [FO — 5 p+i 


n 


A 
The square root of the last expression may be taken to derive the standard 


error of = Since the digamma and the trigamma functions have been tabu- 


lated for considerable ranges,’ these standard error formulae, and those de- 
veloped below for the Type V case should be quite useful. 


5 British Association for the Advancement of Science: Mathematical Tables (Office of the 
British Association, London, 1931), Vol. I, pp. 42-51. 








218 NILAN NORRIS 


For Pearson Type V distributions, the frequency with which the variate z 
falls into the range dz is given by 


pt2 a ¢ 
(17) df = 4(2) ee at 
p!\zx a 


The parameter a measures the absolute dispersion of the distribution, and the 
parameter p determines the general shape of the frequency curve. The relative 
dispersion may be regarded as a population parameter, 6’, defined as the ratio 
of the population geometric mean to the population harmonic mean. Let the 
logarithm of the likelihood for this distribution be represented by L. Then 














(18) L = — nlog p! + n(p + 1) loga — (p + 2) 2 log x; — ad =, 

















the summation being taken over the sample of n individuals. It follows that 






aL 


—nF p(p) + n log a — > log z;; 











ap 
aL . 
=; = — nFs(p); 
Op” 
aL n a 
(19) =-(p+1)—-2-; 
oa a a; 
aL n 
= = 1 9 
os au (P + 1); 


vl @#L _n 


dpda = aadp al 


Let L be maximized with respect to p to derive the geometric mean, and let L be 
maximized with respect to a to derive ¢(—1), or H, the harmonic mean. It is 


clear that for the Type V distribution, the relative dispersion, as we have define 
Fp (py) 1 
it, is the population parameter 6’ = . Therefore, if ¢(0) = G = (IIz,)", 


p+ 1 
1 


, it follows, by an argument similar to that used in the 





and ¢(—1) = H = 








l 


Ss 
a 


n Xs 









Y 


. : . — ‘ 
case of a Type III curve, that the geometric-harmonic ratio, A 8 an optimum 


estimate of the parameter 6’, for distributions well graduated by the Pearson 
Type V curve. 


If only p is being estimated, the variance of p is given by V(p) = and 


1 
nFr(p)’ 






7 + 
f F,(p) — | 
(20) v( ‘) “ a pri 


nF r(p) 








EFFICIENT MEASURES OF RELATIVE DISPERSION 219 


Y 


G 
or the standard error of — is the square root of the last expression, if p alone is 


H 


being estimated, a being given. If the terms in the ratio are reversed, 


H le ‘7(p) — ae i} 
1 =) = 9*> Te 
” V(G)- aR 
and the square root of the last expression yields the standard error of = 


If a and p are being estimated simultaneously, there exists the matrix 


|-*() -#(%%)| leer 


(22) | | 


-a(S8) 022) = 
oadp dp*/ 4) i a 


Y 


from which the variance of ; can be found. In fact 


(23) 1o=———— 


1 
Fr ial a es 
nf (p) s+ | 
and hence 


g” 1 
24 VO’) = —| Fr - 
(24) @) = “lr - 15 
The standard error of fe is then given by the square root of the expression for 
V(6’). If the terms in the ratio are reversed, 


2 ee. te 
(25) v(#) =o] Feo - 45 | 


the square root of which yields the standard error of G 
Tr 

Just as the coefficient of variation is an efficient statistic only for distributions 
well graduated by the normal, or Pearson Type VII curve, so also the two maxi- 
mum likelihood estimates of relative dispersion herein developed are efficient 
only when applied to their appropriate distributions. One may expect to 
obtain an optimum degree of efficiency only when the arithmetic-geometric 
ratio is used for series well specified by the Type III function, and the geometric- 
harmonic ratio is used for series well specified by the Type V function. 

It may be recalled that Karl Pearson proposed the use of the coefficient of 
variation late in the nineteenth century.’ Since that time there appears to 
have been some tendency to rely on it as a measure of relative variation, regard- 

6 “Regression, Heredity, and Panmixia,’’ Philosophical Transactions of the Royal Society 
of London, Series A, Vol. 187, 1896, p. 277. For materials pertaining to the Pearson-Thorn- 
dike controversy resulting from the latter’s suggestion that the ratio of the standard 
deviation to the square root of the arithmetic mean is often a more suitable device than is 





220 NILAN NORRIS 


less of whether or not it extracts from the sample a relatively large amount of 
the pertinent information concerning the parent population.’ There are several 
cases in which the coefficient of variation is not an optimum estimate of relative 
dispersion. For example, in a comparison of the true or parametric variation 
of the weights of humans of given age levels, the arithmetic-geometric ratio is 
often the appropriate statistic to use, since weights tend to be distributed 
according to the Pearson Type III law. Frequently the distribution of weights 
is very well graduated by the Type V function, if the origin is fixed at 0 in 
advance. Although this procedure yields a special two-parameter Type V 
function, the principle of using the geometric-harmonic ratio as an optimum 
estimate of relative dispersion is still valid. Again, in a comparison of the 
relative variation of the personal distribution of wealth and income in certain 
modern countries, the arithmetic-geometric ratio will be found to have a smaller 
sampling variance than that of the coefficient of variation, since the personal 
distribution of wealth and income in these countries tends to be in accordance 
with the Type III law, rather than the normal law. Similarly, the distribution 
of the number of trials required to obtain r successes of an event having a given 
probability usually follows the Type III function, and requires the use of the 
arithmetic-geometric ratio, if the maximum amount of the relevant information 
is to be extracted from the sample. 

It seems clear that in practice the usefulness of the arithmetic-geometric 
ratio and the geometric-harmonic ratio will depend on the type of the distribu- 
tion with which one is dealing, and on the extent to which added efficiency is 
desired. In certain cases there is doubtless room for some difference of opinion 
as to whether or not the degree of added efficiency achieved by the use of these 
maximum likelihood estimates of relative dispersion will merit departing from 
the use of such a time-honored statistic as the coefficient of variation. If one is 
interested in avoiding the assumption of normality implicit in methods cus- 
tomarily used in the more general problem of analysis of variance, an alternative 
is the use of ranks.* Although the efficiency of these rank-correlation methods 
is not always 100%, their economy of effort is sometimes a great advantage. 


HuNTER COLLEGE OF THE City oF NEw YORK. 





the coefficient of variation see Edward L. Thorndike, ‘‘Empirical Studies in the Theory of 
Measurement,’’ Archives of Psychology (The Science Press, New York, 1907), Vol. I, No. 3, 
April, 1907, pp. 9-13; and An Introduction to the Theory of Mental and Social Measurements 
(Teachers College, Columbia University, New York, 1913), 2d. ed. pp. 133 f., or Ist. ed., 
1904, pp. 102 f. See also Helen M. Walker, Studies in the History of Statistical Method 
(The Williams and Wilkins Company, Baltimore, 1929), p. 178. 

7Cf. Walter A. Hendricks and Kate W. Robey, ‘‘The Sampling Distribution of the 
Coefficient of Variation,’’ Annals of Mathematical Statistics, Vol. VII, No. 4, December, 
1936, pp. 129-132. 

8 Harold Hotelling and Margaret Richards Pabst, ‘‘Rank Correlation and Tests of 
Significance Involving No Assumption of Normality,’’ Annals of Mathematical Statistics, 
Vol. VII, No. 1, March, 1936, pp. 29-43. See also Milton Friedman, ‘“The Use of Ranks to 
Avoid the Assumption of Normality Implicit in the Analysis of Variance,’’ Journal of the 
American Statistical Association, Vol. 32, No. 200, December, 1937, pp. 675-701. 





NOTES ON THE DISTRIBUTION OF THE GEOMETRIC MEAN’ 


By Burton H. Camp 


There are two transformation theorems which apply particularly well to the 
distribution of a product and therefore to the distribution of the geometric 
mean of asample. Both are implicit in the known theory of the transformation 
of integrals, but it is useful to state them in forms which are especially adapted 
to probability theory. Several examples will be considered in which distribu- 
tions of the geometric mean will be derived by using these theorems. 

The first theorem may be stated as 


THEOREM A: Let the point set q in an N-dimensional u-space be defined so that 
in q a given function of the u’s, F(uy, ue --- Uy) has the property that 


(1) geS<F <&+ &. 
Let @ be the elementary volume of the point set q defined as an N-tuple integral 


I du, --+- duy 
q 


taken over q, having a value of order dé. Let 

(2) ui = O(t:), 7=1,2,---N 

be continuous and differentiable monotonic functions of the t’s with unique inverses 
(3) t; = 0 (ui). ; 
Let r be the point set in t-space corresponding to q in u-space under the transforma- 
tion (2) with elementary volume given by the integral 

(4) i= | dt; --- dty. 


dty dtw at a point in q for which F = é, and if, for all points 


If J(&) is defined as 2 ee 


in q; 
(5) ee aoe S 


ty 
a J (é) | <M -dé. 


When M is a constant, independent of q, then the volume 7, is, except for terms of 
order (dé)”, given by 


(6) q\J (é)|. 


1 Read at a joint meeting of the American Mathematical Society and the Institute of 
Mathematical Statistics, Indianapolis, December 30, 1937. 


221 





222 BURTON H. CAMP 


The proof is immediate for we have 


- dty du lu 
q du dux ' 
[fae = 1 te due f FQ das + de 
a Lduy dun : 


[ E __, din 1(@) Jar --+ dun| + q-J(). 


du duy 


But, by (5), the integral in the last line has a value less than g M -dé, and @ is of 
order dé. Therefore 7 differs from @|.J(£)| by terms of order (dé)’. 

Let us now apply this theorem to a simple case. The volume of the set q, 
where § <u + --- + uy <&E+ dé, u; < a,i = 1, ---, N, can easily be shown 
to be 


g = C(Na — £)*" dé. 
Let u; = logt;. Then it follows from the theorem that 
7 = K e'(Na — ¢)*" dt, 
7 being the volume of the point set 7, where 
(7) E < log (4 --- tw) < & + dé. 


By the use of (7) one can now use the geometrical method of finding the proba- 
bility distribution of the geometric mean, 


(8) z= (t,--- ty)", 


of samples of N from the universe ¢(¢) dt, provided that ¢ (t:) --- ¢(tw) is a 
continuous function of £. Unfortunately there do not appear to be many such 
@ functions. One that is of interest is 

o(t) dt = kt” dt, O<t<e’. 


Let D(é)dé represent the distribution of £. We have 


D(é) dé | o(t,) eee (ty) dt, eee dtx = [ k* (t, cee fal” dt, A dtn 
rk” & 


kN OF = Ce****(Na — £)" "dé. 
Thence we obtain as the distribution of z: 

2sN+N—1 N-1 
f(x) dx = C,2***""(a — log x)*™ dz. 


The form of f(z) in the special case in which s = 0 and ¢ is a rectangle has been 
found by other authors, and is 


f(x) dx = Cyx* "(a — log x)*™ dz. 


2 E.g. see S. Kullback, ‘‘An application of characteristic functions to the distribution 
problem of statistics,’’ Annals of Mathematical Statistics, vol. 5 (1934), pp. 263-270. 





NOTES ON GEOMETRIC MEAN 223 


The second transformation theorem to be used may be stated as. 

THEOREM B: Let ¥(u)du be the probability element for a given universe and let 
the sample (uw, U2. --- ux) be taken. Let the statistic £ = y(u, Us, --- un) have 
the distribution F(é)dé. If the transformation (2), satisfying the conditions im- 
posed on it in Theorem A be applied both to the universe and to the statistic, yielding 
g(t)dt and — = g(t, --- ty) respectively, then the element of distribution of =, as 
obtained from ¢, is, as before, F(é)dEé. 

The proof is straight forward, for the distribution of £, as obtained from ~(u)du 
is given hy 


[ vu) -++ W(uy) du, --- duy 


and, as obtained from ¢(t)dt, it is 


| o(hi) --- d(tv) dt, --- dty 


where q is the set in w-space where § < y < — + dé and r is the set in ¢é-space 
where & < g < + dé. It is clear that these two integrals have the same value 
because of the relation 


Y(u) du = y(6(t)) + = ¢(t) dt 


and the unique correspondence between the points of q and r set up by the 
transformation (2), with its unique inverse (3). 

This theorem is particularly well adapted to the derivation of the distribution 
of the geometric mean because of the simple logarithmic transformation con- 
necting the sum and the product of N numbers, and because several distributions 
of the sum are already known. Two of these cases will now be presented. 

EXAMPLE 1. Let x be the geometric mean (8) of the sample of N from a 
universe with distribution law 


- sor 
(9) o(t) dt = p) dt (¢ > 1). 


Then the distribution of z is 


_ N*?(log 2)*?™ 
(10) f(x) dz = - IP(Np) 


dx (x > 1), 
and it is to be noticed that x has the same type of distribution as ¢. 


To prove (10), first let & = (u; + --- + uw)/N, where the w’s are a sample 
from a Type III universe, 


—u p-l 


V(u) du = : re du (u > 0). 















224 BURTON .H. CAMP 





Irwin’ has shown that the distribution of ¢ is 
Ne~™*(N¢)*?* ‘ 
r'(Np) 


Making the transformation u = log t, we have 


(11) F(t) dt = 


. uN _ (log t)”™ 
é=log(4--- ty)", (i) dt = 2T(p) dt, (>, 
and F(é)dé is unchanged. We now obtain f(x)dz by substituting = log z in 
(11). 
EXAMPLE’ 2. If x is the geometric mean (8) of asample of N from a universe 
whose distribution is 





1 t\2 
(12) o(t) dt = e 22 (1% 5) dt, (c, t,G > 0), 
Cc Tv 


I 


the distribution of x is 


de = NL ova (5) 
(13) f(x) dx pa € dx, (x > 0). 


To prove this, one begins with the arithmetic mean é and the universe, 

















— ae «ate oe 
Y(u) du = a e 22“ dy. Here F(é) dé = VN e 2a E™) dé. 
CV 2r eV 27 
Again using u = log t, one obtains & = log (t;, --- , ty)” and 
1 t\2 
o(t) dt = a e 22 (108 a) dt, where G = e” > 0, 


tc wT 








and F(£) dé is unchanged. To get (13) one substitutes & = log x in F(&) dé. 
Again it follows that the geometric mean has the same distribution as the 
universe except for a change in one of the parameters (c). This frequency curve 
has other interesting features. It was developed by Galton and McAlister’ by 
quite a different method and was called the curve of equal facility. They were 
seeking for a distribution ¢(¢) which would have the characteristic that, if ¢ and 
t’ were two observations differing from G by the same relative amount, (G — ¢)/t 
= (t’ — G)/G, they would have equal probabilities. McAlister noted various 
properties of ¢, including the fact that G was actually its geometric mean, and 
that it was not the same as the mode or the arithmetic mean. Certain properties 
which he did not mention are the following: 
(t) If one draws a sample from a universe with the distribution ¢ in order to 


3 Biometrika, vol. 19 (1927), p. 229; see also A. Church, Biometrika, vol. 18 (1926), p. 336. 
4 This distribution can also be obtained by the method of A. T. Craig, American Journal 

of Mathematics, vol. 54 (1932), p. 362, but it would be difficult to evaluate his integral 

without the substitution which would be suggested if the distribution were known. 

5 Preceedings of the Royal Society, vol. 29 (1879), pp. 365, 367. 















i le a 





in 


se 


1e 
ve 


re 
id 


/t 


1S 


NOTES ON GEOMETRIC MEAN 225 


determine G, the geometric mean of the universe, the maximum likelihood 
solution is z, the geometric mean of the sample. 

(it) The modal point of the sampling distribution (f) approaches G as a limit 
as N becomes infinite. 

(i722) One can devise a function s of the sample analogous to but different from 
Student’s s, and show that 2/s has a distribution independent of the parameters 
of G and c of the universe. To do this it is necessary first to extend the second 
transformation theorem so as to include cases where the number of statistics 
(functions of the sample) being obtained simultaneously is greater than one 
This is not difficult, but since the analogous tests for significance have been 
developed for the normal universe it would not be particularly useful, for if the 
observations are distributed in accordance with ¢(¢) their logarithms are dis- 
tributed normally, and their logarithms can equally well be used for testing 
significance. 

(iv) If one uses the curve of equal facility instead of the normal curve as the 
distribution of biological lengths, then any power of such lengths, in particular 
the third power, which is supposed to be approximately proportional to weights, 
would also be distributed in the same manner, except for a change in the para- 
meters. This is a property which the normal curve does not have. It raises 
the question: Can biological lengths .be represented by the curve ot equal 
facility? The remainder of this paper will be devoted to a discussion of this 
question and cognate matters. 

The curve of equal facility may be made to approach as a limit the normal 
curve if the origin be moved indefinitely to the left. This is almost intuitively 
evident from a consideration of the hypotheses under which the two curves were 
derived by Galton and McAlister. It is also indicated by the behavior of the 
lower moments. Let v; refer to the 7th moment of (12) relative to the origin 
of t, u; to the corresponding moment relative to the arithmetic mean. It is 
easy to show that 


(14) 4 = Gel", §=0,1,.--, vy, = 1 = Gh, where h = e** 
(15) we = GR(h? — 1), ws = G*n'(n® — 3h’ + 2), 
uy = G*n'(h” — 4h° + 6h’ + 3), 


’ 


(os = ms/pa’? = (h? + 2) (hr? — 1)", 
(16) \ 2 2 4 2 3 3 2 2 
las = ws/u2 = (hb — 1)° + 6(A — 1)° 4+ 15(h — 1) + 16(h — 1) +3. 


From (16) it follows that as h approaches unity a3 and ay approach their normal 
values, 0 and 3, respectively. If at the same time y2 is kept constant, it follows 
from (15) that G’ and therefore 7 become infinite. So the origin is moved an 
infinite distance to the left. 

The question, then, whether the curve of equal facility may be used equally 
well with the normal curve to represent biological lengths depends on whether 
in practical cases the natural choice of origin, which is the position indicated by 








226 BURTON H. CAMP 





zero length, is such as to make the two curves practically indistinguishable. 
This is apparently the situation in the case of human statures. For 8585 adult 
males born in the British Isles° the values of the several constants, obtained by 
so fitting ¢(¢) to the observations that the mean and standard deviations agree, 
are as follows: = 67.46 in., G = 67.411, o = 2.56, h = 1.00072, observed 
az = 0.0125, a; ford? = 0.11; observed a, = 3.149, ay ford = 3.02. Thus for the 
curve of equal facility a; is further from the observed value than for the normal 
curve, but a, is nearer to its observed value. In both cases the difference is 
unimportant. A graph of both curves’ would not make it clear to the eye which 
of the two fitted the data better. 
















It would be expected that the distribution of the cubes of these statures, being t 
roughly proportional to the weights of the men, would not be normally dis- t 
tributed. This also can be verified easily, for the distribution of (y = t*) from 
$(t)dt is o(y)dy except that ck replaces c, and G* replaces G. So the distribution 
of cubes is: C 

« ity foe YS C 
Neee~ A MD 
3cyv/2 
If this curve is fitted to the cubes of the statures, aj = 0.23, and a, = 3.21. f 
Both are considerably further from their normal values than before. For this ¢ 


case the corresponding value of h is 1.0064. It is the closeness of this quantity 
to unity, or in other words the smallness of the coefficient of variation, 100 
o/t = 100 (h° — 1)"", which determines how close the curve is to the normal. 
For the statures o/f = 0.0379. For the cubes of the statures® o/f = 0.269. Its 
values in certain other cases® are: length of forearm 0.05, chest circumference 
0.08, strength of grip 0.26, visual acuity 0.39. It appears to be evident, there- 
fore, that for many types of biometric measurements, especially lengths, which 
we know can be represented well by the normal curve, the curve of equal facility 
is practically just as good. In a given case it may fit a little better or a little 
worse. If we wish the distribution of the arithmetic mean as obtained by 
sampling from such data we may find it by supposing the universe normal; 
if we wish the distribution of the geometric mean we may find it by supposing 
the universe of a curve of equal facility. This device of substituting for the 
normal curve another type of curve which is equally good in practical cases, in 
order to find the distribution of a statistic which cannot be found easily for the 
normal curve, may perhaps be useful also for other statistics than the geometric 
mean. 






















WESLEYAN UNIVERSITY. 

6G. Udny Yule and M. G. Kendall, An Introduction to the Theory of Statistics, London, 
1937, pp. 94, 116, 157, 163, 187. 

7 Such as on page 187, Yule and Kendall. 

8 For the weights of a similar group of men o/t = 0.137, and thus the two curves would 
be more nearly alike if fitted to weights than if fitted to the cubes of these statures. 

* From a long list with values ranging from 0.0049 to 0.5058, compiled by Raymond 
Pearl, Medical Biometry and Statistics, Philadelphia (1930), pp. 347-9. 








NOTE ON A FORMULA FOR THE MULTIPLE CORRELATION 
COEFFICIENT 


By H. M. Bacon 


There are many useful formulas available for the calculation of the multiple 
correlation coefficient in a k variable problem.’ Since it frequently happens 
that the regression equation is the primary object of the statistical analysis, 
the well known formula 


9 
1.03... = Byo.z4..-4 Tig + Bis-24..-4713 + +++ 4+ Bik-23-.. cea) 1k 


can be used to considerable advantage. While many different demonstrations 
of this formula are perfectly familiar, the one given in this note may prove 
of some interest. 

First let us recapitulate briefly certain facts about the regression coefficients 
and the multiple correlation coefficient. Suppose we have k sets of N numbers 
each: 

X il X 12 - " X; N 


Xo, Xoo - - Now 


Xi Xeg + +) Xe. 


Let Z; be the mean of the j-th set, and let 2;; = X;; — %;. We then have k 
sets of N deviations from means, and we shall suppose the following k sets to 
be linearly independent: 

M1 U2 + + Uy 


21 X22 : - an 


Thi Ue2 + + Lkn- 
We shall consider only the regression of the “‘variable” x; upon 22, 23, «++ , Ze. 
Clearly the results obtained can be made to describe the regression of any one 
of the variables upon the other k — 1 variables by rearranging the subscripts. 
As usual let 2, As, --- , Ax have values which will make the sum of squares 


F(t), As, +++ , Ae) = Vari — AoXai — Asai — + — Nate)” 


a minimum. For simplicity we shall omit stating limits of summation and 
understand hereafter that = means ‘“‘sum for 7 from 2 = 1 tot = N.” Neces- 


1For example, see W. J. Kirkham, “Note on the Derivation of the Multiple Correla- 
tion Coefficient’, The Annals of Mathematical Statistics, Volume VIII (1937), pp. 68-71. 


227 








228 H. M. BACON 








sary conditions (which are easily shown to be sufficient) are that dz, As, «++ , 
must satisfy the equations 


aF 
dz 










= —22 (ri — AeTei — AZX3i — +++ — AK Tei) Xai = O 





== —22 (11; — AeXei — AsX3i — +? — Ni Lei) Lai = 


CSC C6 GEE DEES EKE SEA SHS DEEHKHEEEEE OE HOHE HOE COC ED OS OO OD 





—22 (ai — AeTei — AgX3i — +++ — Ne Tes Tei = 


These equations are simply the “‘normal equations” for determining the re- 
gression coefficients. Solving them we obtain 
A2 = dye.34...4 


As = 















bi3.24...% 


Ae = brx.23...(¢—-1 - 
The equation of regression of x; on 22, 23, --- , % is therefore 
D1 = dy2.34...4%2 + Diz.24...4%3 + +++ Ht Drg.23...(e-1 Xe - 
If we let 
Ui = dio.g4...4¥2i + Diz.c4...003i +++ H Oix.23...¢-1 Fe 


forz = 1, 2,---,N then x; — wu; is the residual of the 7-th x,. The coeff- 
cient of multiple correlation of x, in terms of x2, 13, --- , x is defined to be 
the simple correlation coefficient of the x’s and w’s: 









D2; U; 


ri 66...& = a 
2X; LU; 


In case it is desired to express the x’s in terms of their standard deviations, 
the following equation is used: 









ry re 3 Lk 
— = Bio.34.. -k — + Bi3z.24...4 —+... + Bik -23---(k—1) — 
o2 


01 03 Ok 









21 = Bio.34...422 + Biz-24...423 +--+ + Bik-o3-..-(e-1) 2% 





Bio.34..-k = 


ceeoceer eee ees eee ee ees 






Ok 
Bix -23...(e—-1) = Dix-23.--(k—1) — 
01 










rh —_ we -_ res _ 


aa 


MULTIPLE CORRELATION COEFFICIENT 


ae 
z= -. 
oj 


Now if 2A; B; = 0, the set of numbers A; , Az, --- , Aw is said to be orthogonal 
to the set of numbers B, , B,, --- , By. Hence the conditions of equations (1) 
may be described by saying that the ‘values of \2, As, --- , Ax must be such 
that the set of residuals 27;; — u,; is orthogonal to each of the k — 1 sets of 
numbers 22;, %3i, --- ,2xi. But if the set of residuals is orthogonal to each 
of these sets, it is orthogonal to any linear combination of them. Since the 
set of u’s is such a linear combination, we have 


L(x — Ui)Ui = © 
and hence 
(3) Tau; = Duj. 


Since “Uu= Die.34..-422i + bi3.24...4%3: oS see + Dik.23..-(k—-1) Lki it follows at once 
by multiplying both sides by 2; and summing that 


(4) Larites = Dro.34...4D%sXei + O13.04...42AT3i ++ H Ore23...¢-y VATE: « 
Writing 
La1:%2;1 = Noyoory 


ys — 
2X1 iX3i Noyosr1s 


. aii T ° 
Late = Noort: , 


noting the relations between the b’s and the #’s expressed in equations (2), 
and observing that we may write 


. 2 : 9 
Vy oan (27; ui) asl (S21; ui) 
aot Ui = - ——— © Sage 


because of equation (3), we may therefore rewrite equation (4) as follows 


o1 
i — Noio3113 see 
03 


a1 
ose  Big.23..-G-1) — Noor. 
ok 
. se ‘ <i r 2 — 
Now divide both sides by 22j}; = Noj obtaining 


2 c (221i ui)” 


1} .234.. +k —s—5 = Biz.ss..-e 12 + Bis-24..-4 713 +--+ + Brg.23...Ge— Te - 
LLiiLlU; 


This is the formula which was to be established. 


STANFORD UNIVERSITY. 





